INFO and keep-alive - Githubissues

djeastonca commented 1 year ago

The documentation for the INFO command suggests using "INFO ID" as an implementation of keep-alive. Ideally keep-alive is handled by the underlying TCP implementations running in the operating systems of the hosts of the SeedLink server and client. Readers of the specification shouldn't necessarily feel obligated to reimplement this at the application level as well. I suggest either dropping the recommendation or pointing out that there already exists a feature to do this at the TCP level.

crotwell commented 1 year ago

My understanding is that the default TCP keepalive doesn't actually do anything until the socket has been quiet for 2 hours and on some OSs that interval can't be changed. That is pretty long when streaming seismic data, so having a application level keepalive seems worth it to me. In addition, because the TCP keepalive works at the kernel/socket level, it won't tell you about the end-to-end connection, just the first leg. For example if the seedlink goes via a proxy, you might only know that the connection to the proxy is ok.

That said, it feels a little wrong that INFO can be sent during streaming. Overloading INFO ID to implement keepalive mixes functionality. Websockets has a PING-PONG sequence and maybe that idea is worth borrowing? Maybe define a PING command that takes a short message, and the server will respond with a PONG data packet. That would also mean that once DATA starts flowing, you don't mix data packets with text, which might make the code cleaner?

https://developer.mozilla.org/en-US/docs/Web/API/WebSockets_API/Writing_WebSocket_servers#pings_and_pongs_the_heartbeat_of_websockets

andres-h commented 1 year ago

Feedback from proposal team

TCP keepalive is not a replacement for application-level. In-stream INFO is a rather powerful concept and avoids the complexity of a separate stream for “command and control”. A dedicated PING is a waste of syntax, as long as INFO ID remains small in payload, they are no different in practice. If we do add some syntax dedicated to keepalives, it should be bi-directional so the server can send them also.

djeastonca commented 11 months ago

@crotwell good point re: connections through proxies - no guarantee that the probes would be handled intelligently.

My original comment was motivated from the perspective of exploring whether there is any appetite to include a means for the Seedlink server (e.g. on a datalogger) to free up resources in a reasonably timely manner if a client disappears. Unfortunately it's not practical to do this transparently so, as @andres-h points out, one approach could be to have the existing application-level INFO approach made to be bidirectional. Another approach would be to have the server specify a requirement to the client in the initial handshake to regularly 'check in' (e.g. with an INFO ID request) within a specified time interval.

I recognize that we don't have this today in the current protocol implementation, and I don't recall hearing about circumstances where this was a particular problem. So if this notion doesn't resonate I'm happy to back away from it and close this issue.

crotwell commented 11 months ago

I think the expectation should be that the server/logger can always close the socket if no commands have been received from the client within a timeout and no data has been sent, ie the sending buffers have been full (or non-decreasing) for that time.

But a bidirection keepalive would be useful, especially if the packet rate for the selected channels is lower than the timeout window. For example, a seedlink client that is getting only a .01 sps channel might only get a couple of packets a day. Even worse would be a seedlink connection for a triggered channel that may not generate data for an extended time period.

I think it would be advantageous to use SE packets for the bidirectional keepalive instead of text and only allow them during data streaming phase. During the handshaking phase a simple "no commands received within timeout" should be sufficient to close the connection as that should always be short.

djeastonca commented 11 months ago

Upon further review of the SeedLink 4 specification, it's not clear to me that it is clear enough on the notion of sending commands to the server after the transition from the handshaking phase to the data transfer phase. The "Data Transfer" section of the specification says "When handshaking has been finished with END, the server starts sending data packets...". The content at the bottom of the same section then outlines expectations about the data transfer phase, which is focused on the notion of unidirectional transmission of SE (i.e. "data") packets from server to client, and does not support the notion of the server accepting and processing any commands after handshaking is completed with END.

I believe that the intent of the specification is to allow the server to receive and respond to commands from the client at any time while in "real-time mode". In fact, the INFO command description implies this with its recommendation to use it as a keepalive mechanism. This would result in data packets being mixed with command responses, which would likely be at the root of @crotwell 's concern with mixing SE packets with text responses. Since the data being received by the client from the TCP stream would not be homogenous from a high level parsing perspective, it results in some complication for client-side implementations of the protocol but the complication is not unreasonable or insurmountable unless I'm overlooking something.

As proposed earlier in this issue thread, the client could send INFO ID requests to periodically assure itself that the server is still present. For server implementations that want to assure themselves that the client is still there, it could specify a minimum expectation of how often to hear from the client with any request (other than DATA or FETCH, but INFO ID recommended for brevity) rather than introduce additional command syntax to support bi-directional keepalive. Perhaps this could be specified in the response to CAPABILITIES request.

crotwell commented 11 months ago

Having a server timeout value available to the client in some form of CAPABILITIES or INFO output would be helpful. And as long as the client knows the timeout value, bidirectional keepalive is probably not needed?

Thinking more, because the INFO command returns a SE packet and the client never sends anything other than text, I don't think my concern about the mixing of text and SE packets happens. Or to be more precise, it can only happen during the handshaking phase, not the data phase.

djeastonca commented 11 months ago

Having a server timeout value available to the client in some form of CAPABILITIES or INFO output would be helpful. And as long as the client knows the timeout value, bidirectional keepalive is probably not needed?

Functionally it would be a bidirectional application-level keepalive: the server can optionally explicitly set its expectation regarding how often it expects to hear something from the client at a minimum to be confident it's still there, and per earlier in this thread the client can optionally periodically check that the server is still there. Assurances can be made in either or both directions at the application level (e.g. using INFO ID).

Thinking more, because the INFO command returns a SE packet and the client never sends anything other than text, I don't think my concern about the mixing of text and SE packets happens. Or to be more precise, it can only happen during the handshaking phase, not the data phase.

Yes good point. When I wrote my previous response I was overlooking the fact that the INFO responses are carried as a JSON payload in an SE packet so yes, everything coming to the client in the data phase is homogenous.

crotwell commented 11 months ago

Assurances can be made in either or both directions at the application level (e.g. using INFO ID).

Just to be sure I understand, the server should publish a "timeout" value (how is yet to be determined) and the client is expected to send 'INFO ID' periodically at some rate smaller than this timeout, if data is not flowing. However, the server must not send 'INFO ID' to the client, as commands only go client->server. Is this what you are proposing? It is not a true bidirectional keepalive, but seems like it will accomplish what is needed.

djeastonca commented 11 months ago

Yes, the server could optionally publish a "timeout" value and the client would be expected to send any request (presumably 'INFO ID' would be used in practice, for brevity). Correct - the server would not send any request to the client but, as you indicated, it would accomplish what is needed. That is what I meant in my previous comment by "functionally it would be a bi-directional application-level keepalive" (i.e. not literally).

From @andres-h 's earlier comment it sounded like there may be appetite in the proposal team to have a bi-directional application-level keepalive. If that's still the case and this proposal is acceptable, then we can work on detailed syntax and behaviour.

crotwell commented 11 months ago

A possible bidirection keepalive would be for the server to seed a special SE packet, perhaps with format J and subformat A, for "alive". Upon receiving a "JA" packet, the client should respond with INFO ID. The included JSON could be empty, or maybe include a timestamp?

djeastonca commented 11 months ago

Something like that would work as an alternative approach, and have the advantage of being somewhat simpler for clients to implement, since there would be no need to have a timer running to drive the transmission of INFO ID requests. Yes a timestamp might be useful for a rough but convenient transmission latency calculation, but clients would already be doing that with the SE packets containing miniSEED payloads. Maybe something else will come to mind.

andres-h commented 10 months ago

I've had the problem of detecting dead clients when no data is flowing, so I think bi-directional keepalive would be generally useful, but I'm not sure about details.

A possible bidirection keepalive would be for the server to seed a special SE packet, perhaps with format J and subformat A, for "alive". Upon receiving a "JA" packet, the client should respond with INFO ID.

In this case the server must respond to "INFO ID", because it does not know if "INFO ID" was a response or requested by the client explicitly. Different commands like PING and PONG would be cleaner.

This is my personal opinion.

crotwell commented 10 months ago

In this case the server must respond to "INFO ID", because it does not know if "INFO ID" was a response or requested by the client explicitly. Different commands like PING and PONG would be cleaner.

Good point, an agree it would be cleaner. Main thing from my point of view is that the server should not send plain text to the client after handshaking ends. Some special PING-PONG SE packets would makes sense though. Perhaps a "PING" SE packet from server to client, then the client responds with "PONG" as a text command?

Is there a concern about detecting dead clients during handshaking? I would think as this phase should be short-lived that a simple "delete if inactive" rule would work for the server without needing to ping-pong the client?

djeastonca commented 9 months ago

Sorry for the delay in responding.

...Main thing from my point of view is that the server should not send plain text to the client after handshaking ends. Some special PING-PONG SE packets would makes sense though. Perhaps a "PING" SE packet from server to client, then the client responds with "PONG" as a text command?

This would be a good approach since it keeps the client implementation simple (no timers), and the server can send these requests whenever it sees fit.

Is there a concern about detecting dead clients during handshaking? I would think as this phase should be short-lived that a simple "delete if inactive" rule would work for the server without needing to ping-pong the client?

Yes, data hasn't started flowing yet so the server implementation can time out on the closure of the handshake sequence.

Overall we seem to have consensus that an application keep-alive is useful, but now the question would be whether to include it in this revision of the specification or not. IMO the technical integrity of the overall SeedLink proposal is not compromised by the lack of a bi-directional application keep-alive. If a client disappears the server will eventually find out, close its end of the connection, and release associated resources but it's just a matter of how long. So this issue is identifying an optimization that can be made in a future revision, rather than delay the closure of this specification further.

FDSN / SeedLink

INFO and keep-alive #19

Feedback from proposal team