FDSN / SeedLink

https://docs.fdsn.org/projects/seedlink/
Creative Commons Zero v1.0 Universal
1 stars 1 forks source link

Question: DATA command with SEQ number and wildcard SELECTION. #16

Closed ozym closed 10 months ago

ozym commented 1 year ago

This may be an implementation detail, but I noticed in the docs, under Example handshaking there was the error message returned:

ERROR ARGUMENTS using sequence number with station wildcard is not supported

Does this imply that using a "wildcard station selection, e.g. *_*" requires extra care when resuming a connection (assuming some form of seq state is kept). Would the following statement also apply?

If a packet with given sequence number is not available, then the sequence number of the next available packet MUST be used by the server.

That is, could it be that if the sequence number given doesn't match the wild card selection, or is too early, could it just skip to the next one that does match?

Perhaps this is a CAPABILITY thing.

crotwell commented 1 year ago

I think the issue is that sequence numbers in seedlink are per station. So there is no meaningful way to use a restart via sequence number if more than one station is selected, ie wildcards.

This per station sequence numbering may be a problem when transferring data from many stations from a data center. It is not clear to me how one resumes a connection that has been broken when you wish to get all stations within a network. For example if the original request is:

STATION IU *
SELECT *_B_H_?
DATA

and then the connection is broken and the client wishes to restart without getting duplicate data, what should it do? I suppose it could keep track of the last sequence number for each stations that it has seen in the past, then send separate STATION - SELECT - DATA seq commands for each of them. Seems complex and verbose. And if a new station that came online and began sending data during the break in the connection, the client would not know it existed.

I suppose it would have to also append

STATION IU *
SELECT *_B_H_?
DATA

after all the individual STATION - SELECT - DATA seq are sent. The docs for the STATION command says that first match wins, so I think this works. However, the complexity required of the client seems pretty high when really all it wants is to resume its prior connection.

Perhaps much of the difficulty would be removed if sequence numbers were per server instead of per station. This would allow a client to do a reconnect using wildcards and only having to store a single sequence number. Might be simpler for the server as well to only have to keep track of a single sequence to number new packets.

@andres-h Can you comment on why the sequence number is 'per station' and not unique for the whole server? Is there a use case where this is desirable?

Regardless, it would be really helpful to have a section explaining how to properly do a reconnect of a seedlink connection to both avoid missed and duplicate data as that is likely to be a very common need.

andres-h commented 1 year ago

@andres-h Can you comment on why the sequence number is 'per station' and not unique for the whole server? Is there a use case where this is desirable?

SeedLink has used per station sequence numbers from the beginning and we wanted to keep this, because it makes replication easier (eg., you can replicate individual stations rather than the whole server, or you can merge multiple SeedLink sources into one server keeping original sequence numbers). IRIS insisted on having per server sequence numbers, so the draft was worded in the way that both are possible. Probably there should be a respective capability and I think we even had it in some revision.

ozym commented 1 year ago

Ah, that's interesting. I can now see why FETCH and DATA might be different especially with wildcards and time value requests.

I see a FETCH using a station wild card and a time range being workable, whereas a similar DATA selection may not work as expected (the time is related to the station data, rather than the time the data was added to the seedlink server, in this case the sequence number is the appropriate time proxy).

andres-h commented 1 year ago

Feedback from proposal team

Add an example documenting the steps of reconnection (see #18).

crotwell commented 11 months ago

+1 on change