EarthScope / ringserver

Apache License 2.0
30 stars 17 forks source link

Ringserver fails to serve data after a packet timestamp goes backwards in time #49

Open bkuschak opened 5 months ago

bkuschak commented 5 months ago

I'm using Earthworm ew2ringserver to stream MiniSEED packets to ringserver. The data source uses a GPS disciplined clock. Due to poor visibility of the sky, the GPS receiver sometimes looses lock and remains unlocked for a period of time. During this time, the local clock drifts. When the GPS reacquires lock, it causes a timestamp discontinuity where a packet timestamp jumps slightly backwards in time. This makes ringserver unable to serve any data packets that occur after this point, when requesting packets that cover some time window.

To recover I have to kill ringserver, remove the ring data files, and restart it. (It might possibly recover on its own after the ring wraps around and overwrites the old packets, but I haven't verified this).

Attached is a script that demonstrates the problem. Tested with slinktool and Obspy SeedlinkClient. In both cases the ringserver fails to provide any data after the point where it received a data packet with a timestamp that went ~2 seconds backwards in time. See the comments in the script for details.

test_case1.txt

bkuschak commented 5 months ago

More info that might help troubleshooting:

1) After waiting long enough for the ring to wrap around and overwrite the old data, it started working again and now returns all of the data requested that exists in the ring. 2) I looked closer at the timestamps. There are multiple streams in the ring. Within each stream there were no jumps backwards in time. The jump backwards is only with regard to the multiplexed packets In the ring. The timestamps show a jump backwards across different streams. See below.
3) The affected streams have the same identifiers for network, station, and channel. The only difference is the location codes (01 and 02). 4) I didn't see any messages in syslog related to this.

So now I'm doubting whether this was related to a GPS unlock, or just the order of arrival of packets to the ringserver.

Any ideas?

Multiple streams (location 01 and 02):
$ slinktool -S "AM_OMDBO:BHZ" -tw 2024,04,18,07,34,00:2024,04,18,15,34,00 -p localhost:18000
AM_OMDBO_01_BHZ, 299 samples, 200 Hz, 2024,109,07:33:59.000000 (latency ~34344.3 sec)
AM_OMDBO_02_BHZ, 512 samples, 200 Hz, 2024,109,07:33:59.520000 (latency ~34342.7 sec)
AM_OMDBO_01_BHZ, 296 samples, 200 Hz, 2024,109,07:34:00.495000 (latency ~34342.8 sec)
AM_OMDBO_01_BHZ, 304 samples, 200 Hz, 2024,109,07:34:01.975000 (latency ~34341.3 sec)
AM_OMDBO_02_BHZ, 502 samples, 200 Hz, 2024,109,07:34:02.080000 (latency ~34340.2 sec)
AM_OMDBO_01_BHZ, 298 samples, 200 Hz, 2024,109,07:34:03.495000 (latency ~34339.8 sec)
AM_OMDBO_02_BHZ, 499 samples, 200 Hz, 2024,109,07:34:04.590000 (latency ~34337.7 sec)
AM_OMDBO_01_BHZ, 304 samples, 200 Hz, 2024,109,07:34:04.985000 (latency ~34338.3 sec)
AM_OMDBO_01_BHZ, 291 samples, 200 Hz, 2024,109,07:34:06.505000 (latency ~34336.8 sec)
AM_OMDBO_02_BHZ, 513 samples, 200 Hz, 2024,109,07:34:07.085000 (latency ~34335.1 sec)
AM_OMDBO_01_BHZ, 293 samples, 200 Hz, 2024,109,07:34:07.960000 (latency ~34335.3 sec)
AM_OMDBO_01_BHZ, 292 samples, 200 Hz, 2024,109,07:34:09.425000 (latency ~34333.9 sec)
AM_OMDBO_02_BHZ, 501 samples, 200 Hz, 2024,109,07:34:09.650000 (latency ~34332.6 sec)
AM_OMDBO_01_BHZ, 257 samples, 200 Hz, 2024,109,07:34:10.885000 (latency ~34332.6 sec)
AM_OMDBO_01_BHZ, 270 samples, 200 Hz, 2024,109,07:34:12.170000 (latency ~34331.3 sec)
AM_OMDBO_01_BHZ, 154 samples, 200 Hz, 2024,109,07:34:13.520000 (latency ~34330.5 sec)
AM_OMDBO_02_BHZ, 427 samples, 200 Hz, 2024,109,07:34:12.155000 (latency ~34330.5 sec)
          timestamp goes backwards by 2.14 seconds --^

No timestamp jump within requested stream 1. Contiguous from endtime of one packet to startime of next:
$ slinktool -S "AM_OMDBO:BHZ" -tw 2024,04,18,07,34,00:2024,04,18,15,34,00 -p localhost:18000 |grep _01
AM_OMDBO_01_BHZ, 299 samples, 200 Hz, 2024,109,07:33:59.000000 (latency ~60358.3 sec)
AM_OMDBO_01_BHZ, 296 samples, 200 Hz, 2024,109,07:34:00.495000 (latency ~60356.8 sec)
AM_OMDBO_01_BHZ, 304 samples, 200 Hz, 2024,109,07:34:01.975000 (latency ~60355.3 sec)
AM_OMDBO_01_BHZ, 298 samples, 200 Hz, 2024,109,07:34:03.495000 (latency ~60353.8 sec)
AM_OMDBO_01_BHZ, 304 samples, 200 Hz, 2024,109,07:34:04.985000 (latency ~60352.3 sec)
AM_OMDBO_01_BHZ, 291 samples, 200 Hz, 2024,109,07:34:06.505000 (latency ~60350.8 sec)
AM_OMDBO_01_BHZ, 293 samples, 200 Hz, 2024,109,07:34:07.960000 (latency ~60349.3 sec)
AM_OMDBO_01_BHZ, 292 samples, 200 Hz, 2024,109,07:34:09.425000 (latency ~60347.9 sec)
AM_OMDBO_01_BHZ, 257 samples, 200 Hz, 2024,109,07:34:10.885000 (latency ~60346.6 sec)
AM_OMDBO_01_BHZ, 270 samples, 200 Hz, 2024,109,07:34:12.170000 (latency ~60345.2 sec)
AM_OMDBO_01_BHZ, 154 samples, 200 Hz, 2024,109,07:34:13.520000 (latency ~60344.5 sec)

No timestamp jump within requested stream 2.  Contiguous from endtime of one packet to startime of next:
slinktool -S "AM_OMDBO:BHZ" -tw 2024,04,18,07,34,00:2024,04,18,15,34,00 -p localhost:18000 |grep _02
AM_OMDBO_02_BHZ, 512 samples, 200 Hz, 2024,109,07:33:59.520000 (latency ~60202.1 sec)
AM_OMDBO_02_BHZ, 502 samples, 200 Hz, 2024,109,07:34:02.080000 (latency ~60199.6 sec)
AM_OMDBO_02_BHZ, 499 samples, 200 Hz, 2024,109,07:34:04.590000 (latency ~60197.1 sec)
AM_OMDBO_02_BHZ, 513 samples, 200 Hz, 2024,109,07:34:07.085000 (latency ~60194.6 sec)
AM_OMDBO_02_BHZ, 501 samples, 200 Hz, 2024,109,07:34:09.650000 (latency ~60192.1 sec)
AM_OMDBO_02_BHZ, 427 samples, 200 Hz, 2024,109,07:34:12.155000 (latency ~60189.9 sec)