SeisComP / seedlink

Seedlink server to be built within SeisComP
Other
13 stars 17 forks source link

END command is not sent and clients hang #7

Closed luca-s closed 2 years ago

luca-s commented 2 years ago

Dear developers,

I am using seedlink shipped with Seiscomp v4.9.2 and I noticed in certain cases seedlink doesn't send a "END" command to the client at the end of the data transfer. That causes the client to hang forever or until a timeout expires, in case the latter is configured in the client.

This error happens both in BATCH mode and without it. I believe this scenario happens when, among the requested data, there is a station for which seedlink doesn't know anything (i can see "no such station" in the logs).

thanks Luca

jsaul commented 2 years ago

Thanks, Luca, for reporting this. Do you observe the issue only with a recent version (v4.9.2) or is it possible that it existed before?

luca-s commented 2 years ago

It existed also before, but I don't know since when.

luca-s commented 2 years ago

At last I managed to have a look at the code and I discovered the reason of the issue.

Actually there are two separate scenarios when the END command is not issued to the client:

CASE 1 - The client requestes one or more STATION command(s) and the server doesn't recognize ALL of them (no such station log ). Explanation: the END command is issued in here , but the method StationConnection::do_deliver is called in here only if there is at least one valid station request (this check prevents the code to proceed further when there is no valid station commands).

CASE 2 - The client issues multiple valid STATION commands for the same station. The server notices that, but it doesn't mark it as an invalid request (like in here). Since it doesn't mark it as an invalid command, the following SELECT and TIME commands are accepeted too and that brings the server into an inconsistent state that prevent the END command to be sent to the client (again in here )

luca-s commented 2 years ago

There is a third more common scenario where the END command is not sent to the client. Because an END is issued only in here, it never happens if sx.cx.stations_active != 0. Since sx.cx.stations_active is decreased only in here and here under certain conditions, it might not happen (e.g. data not available?) and the condition sx.cx.stations_active == 0 is never verified.

andres-h commented 2 years ago

It would be interesting to see a test case (eg., using "slinktool" with a lot of verbosity) when you think END should be sent and it is not. I haven't yet looked at the code you referenced, but END is mostly used in "dial-up mode" (FETCH) and it is, eg., not sent before each station requested with FETCH (no sequence number) has not delivered at least one packet. END really means there are no active stations anymore, eg., it is impossible to receive more data. This logic has been unchanged for 20 years or something, so if there are bugs, then these are "features" by now ;)

Anyway, we should look into it and a test case would help.

luca-s commented 2 years ago

This logic has been unchanged for 20 years or something, so if there are bugs, then these are "features" by now ;)

LOL

I definitely need test cases to verify my hypothesis on case 2 and 3, since the behaviour I am experiencing could be a bug in the client.

However I can confirm the bug in case 1, which is very unlikely to happen in real life. Anybody can verify that with telnet in few seconds.

luca-s commented 2 years ago

I have been able to replicate with slinktool the bug.

In the next example you can see a normal request:slinktool receives the ENDcommand and terminates the connection.

seedlink-ok

Then we have the bug case, where slinktool doesn't receive the END command and it hangs. I then type CTRL-C to quit the connection.

seedlink-hang

While the last example shows the bug, I wouldn't say that my previous explanation of the bug are correct. In fact, it is not so straightforward to reproduce it. I simply know it happens often because on a SeisComP system I have code that logs every time the connection to seedlink hangs, but I cannot figure out what causes the issue.

gempa-jabe commented 2 years ago

Isn't Seedlink designed to deliver data as soon as it is available for the requested parameters? If the requested time window of one of your channels is not complete then Seedlink will keep the connection open and send the data later when it arrives. It just does not know if and when it arrives. Seedlink is a streaming server and no archive server. That are two different concepts with two different behaviours. Would that explain the observed behaviour?

luca-s commented 2 years ago

That explanation makes sense. So there is no bug here, it's an expected behaviour.

So, is this the reason that scamp has a running timeout (amptool.runningAcquisitionTimeout), to deal with this situation (if I am not wrong, scamp requests data in time window chunks)?

luca-s commented 2 years ago

I am closing this, since @gempa-jabe clarified this is not a bug

gempa-jabe commented 2 years ago

That is exactly the reason for scamp. We do not want to wait forever for data that might never arrive.

jsaul commented 2 years ago

@gempa-jabe Here a fixed time window is requested, with the end time in the past and with all data already available in the SeedLink buffer at the time of the request. The expectation that the request terminates immediately after the data have arrived is not far fetched. How can the client continue if the request doesn't terminate? The timeout will delay the program flow.

@luca-s Whether we are talking here about a bug or just a surprise can only be found out by providing specific examples, for instance based on the slinktool output. And of course by comparing current behavior with expectation. I can say that I also have issues with the current behavior of at least specific servers.