EarthScope / dataselect

Selection and sorting for data in miniSEED format
GNU General Public License v3.0
14 stars 7 forks source link

Wrong data sorting in SDS archive #4

Closed Brtle closed 6 years ago

Brtle commented 6 years ago

I think there is a bug when you add some data in an SDS archive. According to the documentation:

Any output data will always be sorted by ascending time-series segments.

However, by adding data to a day where data is already present in the SDS archive, dataselect seems to concatenate the new data to the old without sorting.

I tried with a small sample. I created a small SDS archive with only 1 file:

msi -tg SDS/2017/FR/STR/LHZ.D/FR.STR.00.LHZ.D.2017.001 
   Source                Start sample             End sample        Gap  Hz  Samples
FR_STR_00_LHZ     2017,001,14:57:28.000000 2017,001,16:00:46.000000  ==  1   3799
Total: 1 trace(s) with 1 segment(s)

I then added this file:

$ msi -tg FR.STR.00.LHZ.D.2017.001.00-01.mseed 
   Source                Start sample             End sample        Gap  Hz  Samples
FR_STR_00_LHZ     2017,001,00:03:22.000000 2017,001,01:03:17.999998  ==  1   3597
Total: 1 trace(s) with 1 segment(s)
dataselect -SDS SDS FR.STR.00.LHZ.D.2017.001.00-01.mseed
msi SDS/2017/FR/STR/LHZ.D/FR.STR.00.LHZ.D.2017.001 
FR_STR_00_LHZ, 004657, D, 512, 214 samples, 1 Hz, 2017,001,14:57:28.000000
FR_STR_00_LHZ, 004658, D, 512, 213 samples, 1 Hz, 2017,001,15:01:02.000000
FR_STR_00_LHZ, 004659, D, 512, 208 samples, 1 Hz, 2017,001,15:04:35.000000
FR_STR_00_LHZ, 004660, D, 512, 211 samples, 1 Hz, 2017,001,15:08:03.000000
FR_STR_00_LHZ, 004661, D, 512, 215 samples, 1 Hz, 2017,001,15:11:34.000000
FR_STR_00_LHZ, 004662, D, 512, 211 samples, 1 Hz, 2017,001,15:15:09.000001
FR_STR_00_LHZ, 004663, D, 512, 210 samples, 1 Hz, 2017,001,15:18:40.000000
FR_STR_00_LHZ, 004664, D, 512, 208 samples, 1 Hz, 2017,001,15:22:10.000000
FR_STR_00_LHZ, 004665, D, 512, 210 samples, 1 Hz, 2017,001,15:25:38.000000
FR_STR_00_LHZ, 004666, D, 512, 209 samples, 1 Hz, 2017,001,15:29:08.000000
FR_STR_00_LHZ, 004667, D, 512, 214 samples, 1 Hz, 2017,001,15:32:37.000000
FR_STR_00_LHZ, 004668, D, 512, 211 samples, 1 Hz, 2017,001,15:36:11.000000
FR_STR_00_LHZ, 004669, D, 512, 209 samples, 1 Hz, 2017,001,15:39:42.000000
FR_STR_00_LHZ, 004670, D, 512, 212 samples, 1 Hz, 2017,001,15:43:11.000000
FR_STR_00_LHZ, 004671, D, 512, 212 samples, 1 Hz, 2017,001,15:46:43.000000
FR_STR_00_LHZ, 004672, D, 512, 211 samples, 1 Hz, 2017,001,15:50:15.000000
FR_STR_00_LHZ, 004673, D, 512, 211 samples, 1 Hz, 2017,001,15:53:46.000000
FR_STR_00_LHZ, 004674, D, 512, 210 samples, 1 Hz, 2017,001,15:57:17.000000
FR_STR_00_LHZ, 004403, D, 512, 211 samples, 1 Hz, 2017,001,00:03:22.000000
FR_STR_00_LHZ, 004404, D, 512, 210 samples, 1 Hz, 2017,001,00:06:53.000000
FR_STR_00_LHZ, 004405, D, 512, 211 samples, 1 Hz, 2017,001,00:10:23.000000
FR_STR_00_LHZ, 004406, D, 512, 214 samples, 1 Hz, 2017,001,00:13:54.000000
FR_STR_00_LHZ, 004407, D, 512, 211 samples, 1 Hz, 2017,001,00:17:28.000000
FR_STR_00_LHZ, 004408, D, 512, 209 samples, 1 Hz, 2017,001,00:20:59.000000
FR_STR_00_LHZ, 004409, D, 512, 209 samples, 1 Hz, 2017,001,00:24:28.000000
FR_STR_00_LHZ, 004410, D, 512, 213 samples, 1 Hz, 2017,001,00:27:57.000000
FR_STR_00_LHZ, 004411, D, 512, 211 samples, 1 Hz, 2017,001,00:31:29.999998
FR_STR_00_LHZ, 004412, D, 512, 215 samples, 1 Hz, 2017,001,00:35:01.000000
FR_STR_00_LHZ, 004413, D, 512, 207 samples, 1 Hz, 2017,001,00:38:36.000000
FR_STR_00_LHZ, 004414, D, 512, 208 samples, 1 Hz, 2017,001,00:42:03.000000
FR_STR_00_LHZ, 004415, D, 512, 213 samples, 1 Hz, 2017,001,00:45:31.000000
FR_STR_00_LHZ, 004416, D, 512, 215 samples, 1 Hz, 2017,001,00:49:04.000000
FR_STR_00_LHZ, 004417, D, 512, 216 samples, 1 Hz, 2017,001,00:52:39.000000
FR_STR_00_LHZ, 004418, D, 512, 213 samples, 1 Hz, 2017,001,00:56:15.000000
FR_STR_00_LHZ, 004419, D, 512, 211 samples, 1 Hz, 2017,001,00:59:47.999998

The new SDS archive file isn't sorted.

chad-earthscope commented 6 years ago

Any output data will always be sorted by ascending time-series segments.

The output of dataselect will be sorted. If you only provided the 2nd file to dataselect as input then that is what will be sorted. Nothing in the existing files, that get appended to, are taken into account. Put another way, dataselect does not "manage" an output/archive data structure, e.g. -SDS (which is just a preset of the -A option).

If you want to make sure that all the data records in a given file are sorted you'll need to specify them as input to the program.

chad-earthscope commented 6 years ago

If I've misunderstood the issue please reopen.

Brtle commented 6 years ago

Thank you for your explanations! Maybe a warning can be added in the documentation of the -A option to clarify this situation? I think it's a little ambiguous.