[BUG] ph5availability is inconsistent with stored time series data

timronan commented 4 years ago

Describe the bug the ph5availabilty function outputs a result that is inconsistent with the timeseries data stored in a ph5expirment.

Environment (please complete the following information):

Centos 7
Program Version [ph5toms 2018.268]
Version: 2019.101 ph5availability

To Reproduce Track the time series data using ph5toms. Add a print statement to line 215 in ph5toms print(s)

Run ph5toms over a timespan of interest and save the printed output. Compare this output to the availability results from the same SNCL and timespan. Gaps are not appearing in the ph5availabilty result.

Expected behavior ph5availabilty should show the same gaps that are output by s = stream.traces[0].stats. s = stream.traces[0].stats is a ground truth for the time series data stored in a ph5 file and the availability should accurately reflect the stored time series data.

Screenshots

printout of traace stats in ph5toms

       network: XD
         station: MA05
        location: 
         channel: HHZ
       starttime: 2014-06-25T00:00:00.005000Z
         endtime: 2014-06-25T11:59:59.995000Z
   sampling_rate: 100.0
           delta: 0.01
            npts: 4320000
           calib: 1.0
     coordinates: AttribDict({'latitude': 46.754669, 'longitude': -122.226189})

ph5availability -n master.ph5 -p ./ -s 2014-06-24T00:00:00 -e 2014-06-26T23:25:59 --station  MA05 --channel HHZ -a 2
#n s      l  c   q                    earliest                      latest
XD MA05   -- HHZ   2014-06-24T00:00:00.000000Z 2014-06-26T23:25:59.000000Z

PH5availability should print out multiple lines showing these data gaps.

Additional context Add any other context about the problem here.

timronan commented 4 years ago

It seems that the DAS table field sample_count_i is not guaranteed to reflect the amount of samples that are present in the raw time series data that the metadata represents. ph5api.get_availability calculates the availability extents based on the das table's sample_count_i field rather than the number of samples and sample rate of the raw time series data.

The availability issue could be solved by calculating time spans using the amount of samples in the trace data instead of the amount of samples that are reported by the metadata. The problem is that the metadata and the trace data still won't be consistent. Does anyone have any ideas about how to enforce metadata and time series consistency? This issue highlights that the PH5 metadata potentially cannot be trusted.

A way to test this: Collect trace data inside the ph5api.get_availability function. Add trace = self.cut(das, start, end, apply_time_correction=False, sample_rate=sample_rate) to line 1433

Compare the trace[i].nsamples and entry['sample_count_i']

in the for loop starting on line 1468.

dsentinel commented 4 years ago

Yikes! This does highlight trust issues. Is this related to time_correction, or apply to all data, with corrections or not?

PIC-IRIS / PH5

[BUG] ph5availability is inconsistent with stored time series data #416