GeoNet / fdsn

FDSN Web Services
MIT License
17 stars 15 forks source link

Service limits and HTTP errors. Documentation? #25

Closed gclitheroe closed 7 years ago

gclitheroe commented 7 years ago

I've been trying to find an answer as to how FDSN handles preparing large requests and sending valid HTTP/1.1 responses to the client. IRIS seem to not bother - they send a 200 response but preparing and sending the response can still fail. There would be no indication to the client that a request failed. Basically a weakness of the FDSN spec and how it uses HTTP/1.1

We have the same problem and need to document this somehow.

IRIS's explanation below is from "Considerations" https://service.iris.edu/fdsnws/dataselect/docs/1/help/

In general, it is preferable to not ask for too much data in a single request. Large requests take longer to complete. If a large request fails due to any networking issue, it will have to be resubmitted to be completed. This will cause the entire request to be completely reprocessed and re-transmitted. By breaking large requests into smaller requests, only the smaller pieces will need to be resubmitted and re-transmitted if there is a networking problem. Web service network connections will break after 5 to 10 minutes if no data is transmitted. For large requests, the fdsnws-dataselect web service can take several minutes before it starts returning data. When this happens, the web service may “flush” the HTTP headers with an “optimistic” success (200) code to the client in order to keep the network connection alive. This gives about 10 minutes to the underlying data retrieval mechanism to start pulling data out of the IRIS archive. Thus for larger requests, the HTTP return code can be unreliable. As data is streamed back to the client, the fdsnws-dataselect service partially buffers the returned data. During time periods when the underlying retrieval mechanism stalls, the web service will dribble the partial buffer to the client in an effort to keep the network connection alive.

It is less efficient to ask for too little data in each request. Each time a request is made, a network connection must be established and a request processing unit started. For performance reasons, it is better to group together selections from the same stations and place them in the same request. This is especially true of selections that cover the same time periods.

This utility should handle a week or month of data from several stations.

nbalfour commented 7 years ago

See https://github.com/GeoNet/www-geonet/pull/447