EarthScope / ringserver

Apache License 2.0
30 stars 17 forks source link

Ringserver serving incomplete data #37

Open nchazarra opened 2 years ago

nchazarra commented 2 years ago

I'm starting my own ringserver with mseed files, but usually it streams incomplete data. Sometimes while using jamaseis or swarm to view the seismic drum slowly recovers and I can see all the packets, but it may take a lot of time. Can this be a problem of configuration or maybe my setup is too slow?

Thanks in advance.

chad-earthscope commented 2 years ago

Hi @nchazarra. It is difficult to guess at what might be happening without further details. I doubt it is a matter of ringserver configuration though. It might be how data are provided to ringserver or it might be issues related to the clients (jamaseis/swarm).

Can you describe how you are providing data to ringserver? Note that ringserver is not really meant to scan a large archive of files and there are options to avoid "monitoring" old data files (StateFile and InitCurrentState).

Also, sharing the non-default config options would be helpful.

nchazarra commented 2 years ago

Hi @chad-iris,

First, thanks for your answer. My configuration file is this:

RingDirectory /home/nchazarra/ringserver

SeedLinkPort 18000 ResolveHostnames 0 ClientTimeout 0 RingSize 2G ServerID "Rojales" MemoryMapRing 1 TransferLogDirectory /home/nchazarra/ringserver/ TransferLogRX 0 MSeedScan /mnt/sismogramas/ StateFile=/home/nchazarra/ringserver/scan.state InitCurrentState=y

My seismometers generate a 512-byte MSEED file archived in folders with this tree /year/month/day every minute. Then I make ringserver to scan and serve the data.

Thanks for your answer.

chad-earthscope commented 2 years ago

MSeedScan /mnt/sismogramas/ StateFile=/home/nchazarra/ringserver/scan.state InitCurrentState=y

How many files are in /mnt/sismogramas/? It shouldn't cause data flow issues, but large numbers of files will require more work/time to process as the entire file listing must be traversed on each scan. Some logic is in place to avoid scanning files that are "old" (7200 seconds, i.e. 120 minutes) as often as the rest, but it still takes work. The recommended strategy is to create a pickup area where new data is provided to the server, and then removed once it's "old".

I do not see anything in your config that looks suspicious.

Note that InitCurrentState=y will cause data to be skipped if the server is restarted and the StateFile does not exist. This is very unlikely to be a problem unless you have strange combination of events, but it is a way that the server can "skip" data.

Is data being provided to the server out of time order? It is not a problem for the server, or the protocol, but some clients may not do what you expect if data is received out of time order.

My suggestion is to try using another client to determine if your observations are server related or client related. I suggest using slinktool combined with msi to read the miniSEED.

If you can allow me to access your server I can try to help you further, you can send me private details to the email address here: https://www.iris.edu/hq/staff/employee/trabant

nchazarra commented 2 years ago

Hi again,

There is 1440 files. I was trying to keep a 24 hour buffer to make an updating dayplot and made a script to remove files older than 24 hours from the ringserver mseed directory (/mnt/sismogramas/). I also have tried slinktool and the output seems normal:

2022.231.12:25:59.0, seq 167808, Received Data blockette 2022.231.12:25:59.0, seq 167809, Received Data blockette 2022.231.12:25:59.0, seq 167810, Received Data blockette 2022.231.12:25:59.0, seq 167811, Received Data blockette 2022.231.12:25:59.0, seq 167812, Received Data blockette 2022.231.12:25:59.0, seq 167813, Received Data blockette 2022.231.12:25:59.0, seq 167814, Received Data blockette 2022.231.12:25:59.0, seq 167815, Received Data blockette 2022.231.12:25:59.0, seq 167816, Received Data blockette 2022.231.12:25:59.0, seq 167817, Received Data blockette 2022.231.12:25:59.0, seq 167818, Received Data blockette 2022.231.12:25:59.0, seq 167819, Received Data blockette 2022.231.12:25:59.0, seq 167820, Received Data blockette 2022.231.12:25:59.0, seq 167821, Received Data blockette 2022.231.12:25:59.0, seq 167822, Received Data blockette 2022.231.12:25:59.0, seq 167823, Received Data blockette 2022.231.12:25:59.0, seq 167824, Received Data blockette 2022.231.12:25:59.0, seq 167825, Received Data blockette 2022.231.12:25:59.0, seq 167826, Received Data blockette 2022.231.12:25:59.0, seq 167827, Received Data blockette 2022.231.12:25:59.0, seq 167828, Received Data blockette 2022.231.12:25:59.0, seq 167829, Received Data blockette 2022.231.12:25:59.0, seq 167830, Received Data blockette

And also the files were readable through msi, obspy and snuffler (Pyrocko). This is a preview with the holes in the data:

java_H3kPJUpDkT

Sometimes it seems to be working, but only a few times:

java_DKLsJ6RYBc

Thanks again for your time and your answers.

chad-earthscope commented 2 years ago

Hi @nchazarra

There is 1440 files. I was trying to keep a 24 hour buffer to make an updating dayplot and made a script to remove files older than 24 hours from the ringserver mseed directory (/mnt/sismogramas/).

That number of files should be no problem, and a good strategy, for providing data to ringserver.

I also have tried slinktool and the output seems normal:

Did you try saving the data with slinktool (with the -o option) and then comparing the "transmitted" data files to the original files? Specifically during a time when the gaps appear on swarm. I'm suggesting this to eliminate the possibility of a client issue with jamaseis and swarm.

You might also try increasing the verbosity of ringserver and sharing the servers log file in case there are clues in there.

nchazarra commented 2 years ago

Hi again @chad-iris and thanks for your patience,

Sorry for taking so long, but I wanted to run more tests and another different setup. The slinktool data I saved is totally normal: no gaps or anything strange.

If I keep running swarm or jamaseis for several minutes, finally all the plot gets the gaps filled. But it can take around five minutes. And I was wondering if that could be related to my setup, so I tried in two different enviroments:

Is it possible that it takes so long because my enviroments are not fast enough serving the data and I need more computing power?

chad-earthscope commented 2 years ago

Hi @nchazarra .

If the data collected by slinktool did not include the gaps the that narrows it down to client issues.

I seriously doubt the lack of compute power is the problem, once the data is in ringserver serving it to many clients is not a large task. I'm less sure about where you run the clients, I wouldn't think they need much compute either.

Is there any chance the data are being read (and therefore served) out of time order? It may be that swarm and jamaseis do not handle out-of-time-order data well. One way to check is reading the data saved via slinktool using msi and visually checking the time stamps.

nchazarra commented 2 years ago

Hi @chad-iris,

I've been testing it again, and downloaded a full day through slinktool. When viewing the traces in jseisgram2K, I found that there are some strange traces, with a very short duration, as you can see in the image:

image

But when I look at the archived data of the same time, it seems normal:

image

Aside from this affair, everything looks normal. How do I check if the data is being read out of time order? My script uploads the data to a folder in order by timestamp, and deletes it the older data the same way.

Thanks in advance.

chad-earthscope commented 2 years ago

When viewing the traces in jseisgram2K, I found that there are some strange traces, with a very short duration, as you can see in the image:

That image appears to be continuous traces, but the time axis is not aligned so it took me a while to realize these are different duration in time. The lack of data being shown is what I expect you are illustrating. It would be much more useful to see this data comparison with a longer window and the time axis aligned so the relative data coverage is clearer.

How do I check if the data is being read out of time order?

ringserver serves data in the same order it was inserted into the buffer, in the case of scanning miniSEED the same order it was read. So the easiest way is to check the time order of data as it is being served. If you read the data you saved with slinktool with msi it will show you the order it was served, and thus the order it was read.