SeisComP3 / seiscomp3

SeisComP is a seismological software for data acquisition, processing, distribution and interactive analysis.
Other
111 stars 88 forks source link

fdsnws: service hangs on SDS request (part 2) #167

Closed petrrr closed 6 years ago

petrrr commented 6 years ago

This problem was first discussed in issue https://github.com/SeisComP3/seiscomp3/issues/163#issuecomment-375210220.

However, the available logs were not useful to understand which request or which file caused the problem. Thanks to some scripting magic by my colleague @vlauciani, we were now able to safe and recover the relevant part of the log:

[...]
2018/03/30 21:45:48 [info/log] [reactor] 93.63.207.206 - - [30/Mar/2018:21:45:48 +0000] "POST /fdsnws/dataselect/1/query HTTP/1.0" 200 457728 "-" "Python-urllib/3.5"
2018/03/30 21:45:48 [debug/RecordStream] trying to open stream sdsarchive:///mnt/archive_aufs
2018/03/30 21:45:48 [debug/SDSARCHIVE] SDS request: 2014,08,03,22,46,51 2014,08,03,22,50,51 IV VITU HNN 
2018/03/30 21:45:48 [debug/SDSARCHIVE] + /mnt/archive_aufs/2014/IV/VITU/HNN.D/IV.VITU..HNN.D.2014.215 (init:1)
2018/03/30 21:45:49 [debug/SDSARCHIVE] - after endtime
2018/03/30 21:45:49 [debug/RecordStream] trying to open stream sdsarchive:///mnt/archive_aufs
2018/03/30 21:45:49 [debug/SDSARCHIVE] SDS request: 2014,08,03,22,46,56 2014,08,03,22,50,56 IV SGTA HHN 
2018/03/30 21:45:49 [debug/SDSARCHIVE] + /mnt/archive_aufs/2014/IV/SGTA/HHN.D/IV.SGTA..HHN.D.2014.215 (init:1)
2018/03/30 21:45:49 [debug/RecordStream] trying to open stream sdsarchive:///mnt/archive_aufs
2018/03/30 21:45:49 [debug/SDSARCHIVE] SDS request: 2014,08,03,22,46,49 2014,08,03,22,50,49 IV SACR HHN 
2018/03/30 21:45:49 [debug/SDSARCHIVE] + /mnt/archive_aufs/2014/IV/SACR/HHN.D/IV.SACR..HHN.D.2014.215 (init:1)
2018/03/30 21:45:49 [debug/SDSARCHIVE] - after endtime
2018/03/30 21:45:49 [debug/RecordStream] trying to open stream sdsarchive:///mnt/archive_aufs
2018/03/30 21:45:49 [debug/SDSARCHIVE] SDS request: 2014,08,03,22,46,33 2014,08,03,22,50,33 IV PIO1 HNZ 
2018/03/30 21:45:49 [debug/SDSARCHIVE] + /mnt/archive_aufs/2014/IV/PIO1/HNZ.D/IV.PIO1..HNZ.D.2014.215 (init:1)
2018/03/30 21:45:49 [debug/SDSARCHIVE] - after endtime
2018/03/30 21:45:49 [debug/SDSARCHIVE] - after endtime
2018/03/30 21:45:49 [debug/RecordStream] trying to open stream sdsarchive:///mnt/archive_aufs
2018/03/30 21:45:49 [debug/SDSARCHIVE] SDS request: 2014,08,03,22,47,07 2014,08,03,22,51,07 IV NOVE EHE 
2018/03/30 21:45:49 [debug/SDSARCHIVE] + /mnt/archive_aufs/2014/IV/NOVE/EHE.D/IV.NOVE..EHE.D.2014.215 (init:1)
2018/03/30 21:45:50 [warning/SDSARCHIVE] sdsarchive: [/mnt/archive_aufs/2014/IV/NOVE/EHE.D/IV.NOVE..EHE.D.2014.215@7206912] Couldn't read mseed header!
2018/03/30 21:45:50 [error/SDSARCHIVE] file read exception: Invalid miniSEED header
2018/03/30 21:45:50 [error/SDSARCHIVE] file read exception: Invalid miniSEED header
2018/03/30 21:45:50 [error/SDSARCHIVE] file read exception: Invalid miniSEED header
2018/03/30 21:45:50 [error/SDSARCHIVE] file read exception: Invalid miniSEED header
2018/03/30 21:45:50 [error/SDSARCHIVE] file read exception: Invalid miniSEED header
2018/03/30 21:45:50 [error/SDSARCHIVE] file read exception: Invalid miniSEED header
[...]

The problem is due to a corrupt file in the data archive. Apparently, the SDS request handler ends up in some infinite loop, never returns and therefore the service itself does stop replying.

The corrupted file is the following:

qmerge confirms that there is a problem with the header at some point.

sysop@eida:/mnt/archive_aufs/2014/IV/NOVE/EHE.D$ qmerge -n /mnt/archive_aufs/2014/IV/NOVE/EHE.D/IV.NOVE..EHE.D.2014.215
NOVE.IV.EHE. rate=100 (2014.215 00:00:01.5402 to 2014.215 05:49:23.8600) : 2096232 points, -0.2 msec correction, (min,max,max_step = -0.2,0.1,0.3 msec)
NOVE.IV.EHE. rate=100 (2014.215 05:49:24.8602 to 2014.215 05:49:26.0002) : 114 points, 0.0 msec correction, (min,max,max_step = 0.0,0.0,0.0 msec)
NOVE.IV.EHE. rate=100 (2014.215 05:49:28.0000 to 2014.215 05:49:29.0000) : 100 points, 0.0 msec correction, (min,max,max_step = 0.0,0.0,0.0 msec)
NOVE.IV.EHE. rate=100 (2014.215 05:49:31.0002 to 2014.215 05:49:37.0002) : 600 points, 0.0 msec correction, (min,max,max_step = 0.0,0.0,0.0 msec)
NOVE.IV.EHE. rate=100 (2014.215 05:49:40.0002 to 2014.215 05:49:47.0002) : 700 points, 0.0 msec correction, (min,max,max_step = 0.0,0.0,0.0 msec)
NOVE.IV.EHE. rate=100 (2014.215 05:49:49.0002 to 2014.215 05:50:02.7202) : 1372 points, 0.0 msec correction, (min,max,max_step = 0.0,0.0,0.0 msec)
NOVE.IV.EHE. rate=100 (2014.215 05:50:03.7200 to 2014.215 05:50:24.0000) : 2028 points, 0.0 msec correction, (min,max,max_step = 0.0,0.0,0.0 msec)
NOVE.IV.EHE. rate=100 (2014.215 05:50:26.0000 to 2014.215 05:50:38.3000) : 1230 points, 0.0 msec correction, (min,max,max_step = 0.0,0.0,0.0 msec)
NOVE.IV.EHE. rate=100 (2014.215 05:50:39.3002 to 2014.215 05:50:41.0002) : 170 points, 0.0 msec correction, (min,max,max_step = 0.0,0.0,0.0 msec)
NOVE.IV.EHE. rate=100 (2014.215 05:50:43.0002 to 2014.215 05:50:59.0002) : 1600 points, 0.0 msec correction, (min,max,max_step = 0.0,0.0,0.0 msec)
NOVE.IV.EHE. rate=100 (2014.215 05:51:01.0002 to 2014.215 05:51:13.0002) : 1200 points, 0.0 msec correction, (min,max,max_step = 0.0,0.0,0.0 msec)
NOVE.IV.EHE. rate=100 (2014.215 05:51:15.0000 to 2014.215 05:51:16.0000) : 100 points, 0.0 msec correction, (min,max,max_step = 0.0,0.0,0.0 msec)
NOVE.IV.EHE. rate=100 (2014.215 05:53:08.0002 to 2014.215 08:58:45.0000) : 1113700 points, -0.2 msec correction, (min,max,max_step = -0.2,0.1,0.3 msec)
NOVE.IV.EHE. rate=100 (2014.215 08:58:47.0000 to 2014.215 09:53:40.0002) : 329300 points, 0.2 msec correction, (min,max,max_step = 0.0,0.3,0.3 msec)
Error: unable to get header from block 14014 from stream /mnt/archive_aufs/2014/IV/NOVE/EHE.D/IV.NOVE..EHE.D.2014.215

Before the latest update the service seemed to have been resilient on such corrupted files in the archive.

andres-h commented 6 years ago

Fixed with https://github.com/SeisComP3/seiscomp3/commit/cab2ab4122f144fef6a34d9b09edabe9e75b88fe.

gempa-jabe commented 6 years ago

@petrrr can you please check if the fix does solve your problem? @andres-h thanks for fixing it.

petrrr commented 6 years ago

@andres-h and @gempa-jabe many thanks for your help and the fix.

@vlauciani has tested the patch this morning and it seems to have solved the problem. Here the relevant part of the log:

[...]
2018/04/05 10:15:26 [debug/log] request (93.63.207.206): /fdsnws/dataselect/1/query?net=IV&sta=NOVE&cha=EHE&start=2014-08-03T22:47:07&end=2014-08-03T22:51:07&nodata=404
2018/04/05 10:15:26 [debug/log] adding stream: IV.NOVE..EHE 2014-08-03T22:47:07.0000Z - 2014-08-03T22:51:07.0000Z
2018/04/05 10:15:26 [debug/RecordStream] trying to open stream sdsarchive:///mnt/archive_aufs
2018/04/05 10:15:26 [debug/SDSARCHIVE] SDS request: 2014,08,03,22,47,07 2014,08,03,22,51,07 IV NOVE EHE
2018/04/05 10:15:26 [debug/SDSARCHIVE] + /mnt/archive_aufs/2014/IV/NOVE/EHE.D/IV.NOVE..EHE.D.2014.215 (init:1)
2018/04/05 10:15:26 [warning/SDSARCHIVE] sdsarchive: [/mnt/archive_aufs/2014/IV/NOVE/EHE.D/IV.NOVE..EHE.D.2014.215@7206912] Couldn't read mseed header!
2018/04/05 10:15:26 [warning/log] responding with error: 404 (Not Found)
2018/04/05 10:15:26 [info/log] [reactor] 93.63.207.206 - - [05/Apr/2018:10:15:25 +0000] "GET /fdsnws/dataselect/1/query?net=IV&sta=NOVE&cha=EHE&start=2014-08-03T22:47:07&end=2014-08-03T22:51:07&nodata=404 HTTP/1.0" 404 - "-" "curl/7.26.0"
[...]

May I propose that also some enhancement of the log system is considered? I am thinking of a way to to avoid the problem observed along with this issue, where that relevant information is flushed out of the preserved log files. Maybe limiting the no. of equal messages would be helpful. Should we open a new issue on this?

gempa-jabe commented 6 years ago

May I propose that also some enhancement of the log system is considered? I am thinking of a way to to avoid the problem observed along with this issue, where that relevant information is flushed out of the preserved log files. Maybe limiting the no. of equal messages would be helpful. Should we open a new issue on this?

You can use syslog as logging backend instead of the logfiles. And syslog can be configured in that regard I think. Is that an option?