EarthScope / ispaq

Python command line script that uses R packages to calculate seismology data quality metrics.
GNU Lesser General Public License v3.0
22 stars 11 forks source link

Incorrect availability values when traces out of order #12

Closed laura-iris closed 2 years ago

laura-iris commented 2 years ago

We found that when the miniSEED segments are out of order, the availability values (e.g. num_gaps, num_overlaps, percent_availability) are inaccurate.

This was observed using local files for QW.BCV11 which has many segments in a single day file:

msi -T -Q QW.BCV11..HNZ.2022.152
....
Total: 1 trace(s) with 157 segment(s)

Of note, the data within the day file are out of order. For example:

msi QW.BCV11..HNZ.2022.152
...
QW_BCV11__HNZ, 000436, D, 256, 172 samples, 100 Hz, 2022,152,20:27:12.070000
QW_BCV11__HNZ, 000437, D, 256, 172 samples, 100 Hz, 2022,152,21:15:19.150000
QW_BCV11__HNZ, 000438, D, 256, 172 samples, 100 Hz, 2022,152,20:27:13.790000
QW_BCV11__HNZ, 000439, D, 256, 172 samples, 100 Hz, 2022,152,20:27:15.510000
QW_BCV11__HNZ, 000440, D, 256, 172 samples, 100 Hz, 2022,152,21:15:20.870000
...

ISPAQ was calculating unexpectedly high values for the gaps and overlaps metrics, and reporting 0% availability:

QW.BCV11..HNZ.D,2022-06-01T00:00:00,2022-06-02T00:00:00,num_gaps,1210
QW.BCV11..HNZ.D,2022-06-01T00:00:00,2022-06-02T00:00:00,max_gap,8135.86
QW.BCV11..HNZ.D,2022-06-01T00:00:00,2022-06-02T00:00:00,num_overlaps,1008
QW.BCV11..HNZ.D,2022-06-01T00:00:00,2022-06-02T00:00:00,max_overlap,5518.28
QW.BCV11..HNZ.D,2022-06-01T00:00:00,2022-06-02T00:00:00,percent_availability,0

ISPAQ uses ObsPy to read in the miniSEED files, which was also reading the data into the stream object out of chronological order. When this happens, the ObsPy object indicates that there are 2217 traces - which is the num_gaps plus num_overlaps reported (plus 1). This indicated a probable source of the erroneous metric values.

>>> len(obspy.read('QW.BCV11..HNZ.2022.152'))
2217

To test whether this ordering was causing the problem, I tested out using the .sort() method on the ObsPy stream object after reading in the data. After adding the sorting, the availability metrics seem more reasonable:

QW.BCV11..HNZ.D,2022-06-01T00:00:00,2022-06-02T00:00:00,num_gaps,158
QW.BCV11..HNZ.D,2022-06-01T00:00:00,2022-06-02T00:00:00,max_gap,8135.86
QW.BCV11..HNZ.D,2022-06-01T00:00:00,2022-06-02T00:00:00,num_overlaps,0
QW.BCV11..HNZ.D,2022-06-01T00:00:00,2022-06-02T00:00:00,max_overlap,0
QW.BCV11..HNZ.D,2022-06-01T00:00:00,2022-06-02T00:00:00,percent_availability,33.9016