eqcorrscan / EQcorrscan

Earthquake detection and analysis in Python.
https://eqcorrscan.readthedocs.io/en/latest/
Other
166 stars 86 forks source link

Template write / read merges traces in rare case where traces from same channel align exactly #497

Open flixha opened 2 years ago

flixha commented 2 years ago

Describe the bug Consider a template for a 1-component sensor with only a vertical channel, with P- and S-pick on the same trace. If the traces happen to line up exactly, that is, the first sample of the S-trace follows the last sample of the P-trace, then the obspy / mseed library merges the waveforms for the P- and S-part into one trace. The result is a template where some traces suddenly have double the length, and once there is a mix of trace lengths, then of course EQcorrscan will report an error.

To Reproduce see attached template object - trace NS.ARA1.00.BHZ is merged into one trace when reading, even though the template had two traces for NS.ARA1.00.BHZ before writing. Template_sample.gz (.tgz file renamed to .gz due to file type restrictions on Github)

from eqcorrscan.core.match_filter import Template, Tribe
tribe = Tribe().read('Template_sample.gz')
templ = tribe[0]
assert len(list(set([tr.stats.npts for tr in templ.st]))) == 1

Expected behavior The trace should be properly split into the two parts for P and S.

Desktop (please complete the following information):

Additional context This problem is not directly caused by EQcorrscan, but rather by the mseed-reading routines. I'm trying to look for some lower-level options that stop how the data are merged into one trace in the miniseed routines, but I don't know the codes so well so this may take some time. I'm not sure yet whether this is an issue of reading or writing the miniseed file. If the mseed library doesn't allow the trace to be split in this case, then we may have to think of a workaround or some extra checks.

Even though I'm continuing to investigate, if you have any ideas, please let me know!

calum-chamberlain commented 2 years ago

Nice spot! That's an ugly bug, thanks for finding it. I think we should probably try to cope with it in here if we can, although I'm not sure how at the moment....

Sorry for the bug!

flixha commented 2 years ago

I see that the record numbers in the mseed-file stillindicate that it was originally two traces:

NS_ARA1_00_BHZ, 000001, D, 512, 56 samples, 20 Hz, 2007,194,09:10:30.027000
NS_ARA1_00_BHZ, 000002, D, 512, 56 samples, 20 Hz, 2007,194,09:10:32.827000
NS_ARA1_00_BHZ, 000003, D, 512, 56 samples, 20 Hz, 2007,194,09:10:35.627000
NS_ARA1_00_BHZ, 000004, D, 512, 56 samples, 20 Hz, 2007,194,09:10:38.427000
...
NS_ARA1_00_BHZ, 000032, D, 512, 56 samples, 20 Hz, 2007,194,09:11:56.827000
NS_ARA1_00_BHZ, 000033, D, 512, 8 samples, 20 Hz, 2007,194,09:11:59.627000
NS_ARA1_00_BHZ, 000001, D, 512, 56 samples, 20 Hz, 2007,194,09:12:00.027000
NS_ARA1_00_BHZ, 000002, D, 512, 56 samples, 20 Hz, 2007,194,09:12:02.827000
NS_ARA1_00_BHZ, 000003, D, 512, 56 samples, 20 Hz, 2007,194,09:12:05.627000
NS_ARA1_00_BHZ, 000004, D, 512, 56 samples, 20 Hz, 2007,194,09:12:08.427000
...
NS_ARA1_00_BHZ, 000033, D, 512, 8 samples, 20 Hz, 2007,194,09:13:29.627000

But I can't see that this information is read into the stream object:

st.select(station='ARA1')[0].stats.mseed
AttribDict({'dataquality': 'D', 'number_of_records': 66, 'encoding': 'FLOAT64',
    'byteorder': '>', 'record_length': 512, 'blkt1001': AttribDict({'timing_quality': 0}),
    'calibration_type': False, 'filesize': 1563648})
ylseanna commented 1 month ago

+1

I annoyingly also encountered this problem, twice during a single day of data even. Any fixes on the horizon (or known work-arounds in the meantime)?