Closed jschaeff closed 4 years ago
The packet buffer is not plain miniSEED, and so cannot be scanned with msi or other such tools.
Does it make any difference if the client is local, on the host running ringserver?
No errors in the log file? Even with increased verbosity?
I see you tried the local connection. Hmm, not good.
No errors in the log at all?
I would suspect that something is going wrong with scanning of the data, but it's hard to tell at this point.
Note that the combination of removing the state file and InitCurrentState=y option, will cause the scanning to skip any old data on a restart of ringserver.
Thank you for your quick answer. I'm going to run a ringserver with -vv, on port 18002, connect a local client to it and look at the logs.
Note that we run 2 instances of ringserver (one on 18000 for public data and one on 18001 for restricted data).
I can't tell since when we have this issue, but we had some similar problems some month ago and could'nt fix it. Now I'm investigating for real :)
I'll report here when I get something new.
Hi Chad, could it be that our packetbuf size is too small (or too big) ? We have it at 8GB, and we scan a directory of 191GB of SDS data. almost 20000 files.
I've started a ringserver only scanning one station, and it ships clean data, no gaps detected. Over the WE, I'll let it run and report back here.
could it be that our packetbuf size is too small (or too big) ? We have it at 8GB, and we scan a directory of 191GB of SDS data. almost 20000 files.
Ring buffer size should only be a problem if it is scanning more than the buffer size in a short period of time. In your case, the buffer will only have the last 8GB of data scanned. Are you adding more than 8GB per day (or shorter period) to the SDS repository you're scanning?
Note, you cannot use a ring buffer larger than 8GB and still export data via SeedLink with the default packet size of 512 bytes. This is due to a combination of how ringserver maps SeedLink packet IDs and the maximum SeedLink packet ID of 16,777,215.
Good morning Chad, As for now, we ingest approx 2GB/hour and keep the data 7 days.
We tested today by collecting the realtime data from all sources in hourly rotated files, and cleaning all h-2 files. It seems to be much better. I'm going to migrate our production configuration tomorrow and hope to fix the issue.
Cheers,
jonathan
Good morning. OK, that is a relatively large flow of data, and, I am presuming, a lot of files. There is only one scanning thread per scan config option. It may be that the internal scanner cannot keep up with the combination of many, many files and data rate.
To see how long an internal scanner takes to make a recursive pass through the specified directory you can see the "Scan time" (in seconds) via the server status page at: http://localhost:18000/status
By default you may not be able to see that page from a remote host and you'll need to add trusted addresses to allow other hosts to access that page.
If the scan time is very large, many minutes, then it's probably a good idea to break up the scanning or reduce the data being scanned (as you have done).
Alternative to removing the older data as you've done, one other suggestion:
Run multiple MSeedScan options to parallelize the scanning. If you have top level directories that spilt the data set somehow, i.e. between networks, this can be used to divide the scanning between multiple threads. Be careful to use different state files for each thread. For example:
MSeedScan /data/<NET1>/ StateFile=scan.NET1.state InitCurrentState=y
MSeedScan /data/<NET2>/ StateFile=scan.NET2.state InitCurrentState=y
I enabled HTTP protocol, I didn't knew about the status page, it's very helpful, thank you ! (Now, I've read all your documentation :) ). Now, with a scan process per network, the scan time is very short. Sometimes it takes 1s, but on average it's around the millisec.
Here is how it looks like:
ringserver/2018.078
Organization: RESIF Ring Server
Server start time (UTC): 2020-03-10 14:25:20
Ring version: 1
Ring size: 8589934592
Packet size: 512
Max packet ID: 16777215
Max packets: 13591662
Memory mapped ring: TRUE
Volatile ring: FALSE
Total connections: 194
Total streams: 1931
TX packet rate: 2529.1
TX byte rate: 1294879.4
RX packet rate: 0.0
RX byte rate: 0.0
Earliest packet: 3009696
Create: 2020-03-10 08:37:25.014504 Data start: 2020-01-24 04:34:59.910000 Data end: 2020-01-24 04:35:04.020000
Latest packet: 16601357
Create: 2020-03-10 14:44:57.815431 Data start: 2020-03-10 14:44:53.243130 Data end: 2020-03-10 14:44:56.683130
Server threads:
DataLink SeedLink HTTP, Port: 18000
Mini-SEED Scanner
Directory: /home/sysop/work/out/2020/3C
Max recursion: -1
State file: /home/sysop/work/scan.3C.state
Match:
Reject:
Scan time: 1.00048
Packet rate: 0
Byte rate: 0
Mini-SEED Scanner
Directory: /home/sysop/work/out/2020/8C
Max recursion: -1
State file: /home/sysop/work/scan.8C.state
Match:
Reject:
Scan time: 1.00025
Packet rate: 0
Byte rate: 0
Mini-SEED Scanner
Directory: /home/sysop/work/out/2020/CL
Max recursion: -1
State file: /home/sysop/work/scan.CL.state
Match:
Reject:
Scan time: 0.001971
Packet rate: 2536.78
Byte rate: 1.29883e+06
Mini-SEED Scanner
Directory: /home/sysop/work/out/2020/FR
Max recursion: -1
State file: /home/sysop/work/scan.FR.state
Match:
Reject:
Scan time: 0.014774
Packet rate: 270.746
Byte rate: 138622
Mini-SEED Scanner
Directory: /home/sysop/work/out/2020/G
Max recursion: -1
State file: /home/sysop/work/scan.G.state
Match:
Reject:
Scan time: 0.004674
Packet rate: 2139.5
Byte rate: 1.09542e+06
Mini-SEED Scanner
Directory: /home/sysop/work/out/2020/GL
Max recursion: -1
State file: /home/sysop/work/scan.GL.state
Match:
Reject:
Scan time: 0.001474
Packet rate: 6784.26
Byte rate: 3.47354e+06
Mini-SEED Scanner
Directory: /home/sysop/work/out/2020/MQ
Max recursion: -1
State file: /home/sysop/work/scan.MQ.state
Match:
Reject:
Scan time: 0.001219
Packet rate: 7383.1
Byte rate: 3.78015e+06
Mini-SEED Scanner
Directory: /home/sysop/work/out/2020/MT
Max recursion: -1
State file: /home/sysop/work/scan.MT.state
Match:
Reject:
Scan time: 0.001169
Packet rate: 9409.75
Byte rate: 4.81779e+06
Mini-SEED Scanner
Directory: /home/sysop/work/out/2020/ND
Max recursion: -1
State file: /home/sysop/work/scan.ND.state
Match:
Reject:
Scan time: 0.001417
Packet rate: 4234.3
Byte rate: 2.16796e+06
Mini-SEED Scanner
Directory: /home/sysop/work/out/2020/PF
Max recursion: -1
State file: /home/sysop/work/scan.PF.state
Match:
Reject:
Scan time: 0.005829
Packet rate: 9950.25
Byte rate: 5.09453e+06
Mini-SEED Scanner
Directory: /home/sysop/work/out/2020/RA
Max recursion: -1
State file: /home/sysop/work/scan.RA.state
Match:
Reject:
Scan time: 0.039066
Packet rate: 13464.4
Byte rate: 6.89377e+06
Mini-SEED Scanner
Directory: /home/sysop/work/out/2020/RD
Max recursion: -1
State file: /home/sysop/work/scan.RD.state
Match:
Reject:
Scan time: 1.00052
Packet rate: 0
Byte rate: 0
Mini-SEED Scanner
Directory: /home/sysop/work/out/2020/WI
Max recursion: -1
State file: /home/sysop/work/scan.WI.state
Match:
Reject:
Scan time: 0.002687
Packet rate: 3349.46
Byte rate: 1.71492e+06
I've run tests with multiple scan processes. It seems much better. I still have ONE messy station, in the FR network and can't get why we stream gapped data (FR_ILLK). For the records, I keep now 2 days of data, but as I understand it, a large retention should not bother the scanners. Am I right ? This is our realtime data storage size in MB:
1188 3C
304 8C
904 CL
13143 FR
1045 G
781 GL
576 MQ
1064 MT
2796 ND
1933 PF
8243 RA
168 RD
976 WI
And the number of files:
3C: 8
8C: 3
CL: 20
FR: 135
G: 33
GL: 17
MQ: 12
MT: 7
ND: 6
PF: 40
RA: 91
RD: 9
WI: 11
For the records, I keep now 2 days of data, but as I understand it, a large retention should not bother the scanners. Am I right ?
Mostly. The scanning tracks files in two (potentially three) states:
Active
- modified within the last 7200 seconds (120 minutes), check each scanIdle
- modified more than 7200 seconds ago, check every 60th scanSo the I/O is reduced for old data significantly, mostly avoiding a stat() call. The scanning still has to hop through the directory structures to find entries. For now, these values are hard coded in ringserver.
There is a third state called Quiet
, where, after a certain number of seconds of not being modified, the file will never be checked again. This is currently turned off in ringserver, but could be make configurable if you believe it would help you.
Thank you for your help ! We greatly enhanced the quality of the RT data by splitting the scans. The remaining gaps might come from the chaotic station transmission itself. We have plans to simplify our realtime service based on this work.
Hi @jschaeff, Note that there is a new release (https://github.com/iris-edu/ringserver/releases/tag/v2020.075) that improves SeedLink connection resuming. In some cases clients connecting and reconnecting repeatedly may get duplicate data (maybe gaps too).
We have a ringserver version: 2018.078 running in MSeedScan mode:
MSeedScan /home/sysop/work/out StateFile=/home/sysop/scan.state InitCurrentState=y
The directory scanned by the ringserver is actual realtime data and looks clean. For instance:
We observe a lot of gaps when gathering data form rtserve.resif.fr For instance:
then
Doing a lot of tests from several clients, we ruled out potential client problems. They all gather data with gaps, although the gaps are not the same. We also run a slinktool on
localhost:18000
to rule out network issues. We also got gaps.Restarting the ringserver and deleting the
scan.state
file does not change anything.We found out that the packetbuf itself contains a lot of gaps, but it does not fit with the gaps collected by the clients;
(complete output file attached)
Our ringbuffer size is 8GB. The age of the oldest data in the packetbuf seems to tell that it's enough.
Do you have any idea what's going wrong ? (or are the gaps in the packetbuffer normal, in which case, the problem is elsewhere ?)