EarthScope / dataselect

Selection and sorting for data in miniSEED format
GNU General Public License v3.0
14 stars 7 forks source link

timeseries selection not occurring when multiple selections requested within a single record #3

Closed danauerbach closed 6 years ago

danauerbach commented 7 years ago

When two selection criteria are within a single ms record, dataselect is not performing any selection.

For example, using the selection criteria:

II SUR 00 LN1 2017,187,21:35:00 2017,187,23:07:00 II SUR 00 LN1 2017,187,23:17:00 2017,188,00:00:00

when the end of the first selection (2017,187,23:07:00) and the start of the second (2017,187,23:17:00) exist within the a single record the complete record is passed to the output.

3 files are zipped and attached that include a selection file (select-ln1) with 3 selections, an input file (input-ln1.ms), and the resulting output file (output-ln1.ms).

dataselect-multi-select-example.zip

dataselect version 3.19 built and run on RHEL 6.

From email exchange with Chad (2017-07-12):

Hi Dan,

All is well thanks, hope the same for you.

You have found a limitation of dataselect. I'm not sure I realized this before, but dataselect does not appear to trim a single record based on two different selection entries. At least that's what it looks like n ow. >So even with just two selections:

II SUR 00 LN1 2017,187,21:35:00 2017,187,23:07:00 II SUR 00 LN1 2017,187,23:17:00 2017,188,00:00:00

it doesn't work and I think it's because 2017,187,23:07:00 and 2017,187,23:17:00 are in the same record.

The work around is to run each sub-selection of data individually, meaning do a selection for each of time ranges you want in sequence with different runs of dataselect. Not ideal but that should work.

If you would post the simple example to GitHub as an issue I would be grateful. It may be fixable, but it may be really fundamental to the design and not really tractable without major shuffling, hard to tell at the >m oment.

Chad

chad-earthscope commented 6 years ago

Another way to describe this issue: dataselect is not able to trim/remove data from the middle of a record, leaving the earliest and latest samples.

To perform that operation requires copying the record and trimming it once for earliest samples and once for the latest samples. This is the simplest of the general case of multiple selections from a single record, which quite quickly becomes pathological. The work around is to do a single selection per run of dataselect, might have to be good enough for a while.

chad-earthscope commented 6 years ago

A test for this condition and printed warning are included in version 3.20 (https://github.com/iris-edu/dataselect/releases/tag/v3.20). A CAVEATS AND LIMITATIONS section was added to the documentation describing the condition and workaround.