FDSNWS-Dataselect stuck on multi-gap/overlap file

massimo1962 commented 3 years ago

We have found this problem in our Seiscomp4 (4.5) installation that is used for FDSNWS Dataselect: The fdsnws request is:

wget -O mseed "http://webservices.ingv.it/fdsnws/dataselect/1/query?network=MN&station=AQU&location=--&channel=BHZ&starttime=1995-01-02T00:00:00&endtime=1995-01-02T01:00:00"

During the analysis of the datafile we spotted that it has a lot of gaps and overlaps (like some others our files), shown in the following figure.

mseed-request-block-seiscomp

So we have discovered that whenever we make the request of these particular files our Seiscomp4 - fdnsws freezes. Moreover we have noticed that one cpu goes to 100% and the other cpu remains idle (100).

Our idea is that probably the fdsnws process freezes when reading these files.

Example file : MN.AQU..BHZ.D.1995.zip

gempa-jabe commented 3 years ago

Thank you for providing the example data. I could reproduce the error and it is not fdsnws related. It is caused by the SDS archive recordstream implementation.

gempa-jabe commented 3 years ago

Actually it is the MiniSEED parser that causes an endless loop. The issue with your records is, that there is no blockette 1000 which is required to determine the record length. I have fixed the endless loop which causes 100% CPU but it will return no data. The records without blockette 1000 are being ignored currently. As there are some other issues with the parser we need to rework that.

massimo1962 commented 3 years ago

Many thanks for your reply, if could be useful we have tested the our old seiscomp3 version and there are no problem.

gempa-jabe commented 3 years ago

Which version is your old SeisComP3?

petrrr commented 3 years ago

@gempa-jabe: Thank you for having looked into this. We then also discovered that the blockette 1000 were missing.

Apparently the data is pre-V2.3 "data only" SEED (sometimes these also are dubbed miniSEED as I found out), but not valid miniSEED as specified in SEED V2.3. Which makes kind of sense considering the age of the data, i.e. 1995. (Okay the specs of V2.3 were published earlier, but it is not unlikely that this update was not adopted at production time yet).

There seems to be plenty of such data around and a lot of relevant software supports it, in particular libmseed & Co, Obspy, ... For the services delivering non-decoded data the lack of the information contained in Blockette 1000 should not even be that relevant as most decisions would probably be based on the fixed header only, right?

Anyway, we would need to understand if you plan to support such pre-V2.3 data as well, so we can choose some strategy on how to deal with the situation. Indeed, as a general strategy we would prefer not to touch the original data if anyhow possible.

massimo1962 commented 3 years ago

Our old version: seiscomp3-jakarta-2018.327p23-debian-9-x86_64.tar.gz

gempa-jabe commented 3 years ago

Our old version: seiscomp3-jakarta-2018.327p23-debian-9-x86_64.tar.gz

That is strange, as the parser code has changed with 2018.327p19. Anyway, I know what code change causes the issue.

gempa-jabe commented 3 years ago

There seems to be plenty of such data around and a lot of relevant software supports it, in particular libmseed & Co, Obspy, ... For the services delivering non-decoded data the lack of the information contained in Blockette 1000 should not even be that relevant as most decisions would probably be based on the fixed header only, right?

Unfortunately not. Blockette 1000 contains the length of the record which is necessary for correct reading from non-seekable streams. Without blockette 1000 you also don't have information about the used encoding. There seeems to be a default anyway. Without blockette 1000 we need to search for the next valid MS header to estimate the record length. This is actually what libmseed does as well. But again, not simple and portable with streams where you can't seek.

Anyway, we would need to understand if you plan to support such pre-V2.3 data as well, so we can choose some strategy on how to deal with the situation. Indeed, as a general strategy we would prefer not to touch the original data if anyhow possible.

I understand your wish ;) We will look into a portable solution to this issue. As I said, there are other issues with the parser for some corner cases which need to be resolved as well.

petrrr commented 3 years ago

@massimo1962 and @gempa-jabe

Our old version: seiscomp3-jakarta-2018.327p23-debian-9-x86_64.tar.gz

That is strange, as the parser code has changed with 2018.327p19. Anyway, I know what code change causes the issue.

I guess the explanation is that the p23 was downloaded to the server but actually never deployed, AFAIK. Note also that some of the changes in p23 were actually triggered by some other issue we reported on data with gaps.

So yes, there might be some conflicting targets around how to determine the end of a record.

gempa-jabe commented 3 years ago

I have hopefully fixed it. The change is in the common repository. You can give it a try by either installing a master build or patching the corresponding source file in your installation and just rebuilding it. But all tests succeeded and I can read your file.

@massimo1962, @petrrr: Am I allowed to use the first two records of your example file as test data set and include it in the SC test suite?

petrrr commented 3 years ago

Hi @gempa-jabe:

I post the commit for reference: https://github.com/SeisComP/common/commit/18539926ca62a5c1bf04afa76725c2bc8f6f9136

We well test the patch. And sure you can use the file (or some small portion of it) for the test suite.

BTW: We have also reviewed the archive, and actually I learned that that correcting the archived files is less impactful I thought, because there is space for the Blockette and no repackaging would be necessary. So we probably will clean this up as well.

Still "tolerating" such files might still be beneficial for SC.

petrrr commented 3 years ago

Hi @gempa-jabe, Thanks again.

We have done various testing, including some tests for possible regression. Looks your patch resolves the problem with blockette-1000-less files. Also gappy data is handled correction in our tests. So this issue seems to be solved.

Are you planning for a patch release any soon? Just to understand how to handle deployment in production.

gempa-jabe commented 3 years ago

Yes, a new 4.x release is on the way.

SeisComP / main

FDSNWS-Dataselect stuck on multi-gap/overlap file #16