GeoNet / fdsn

FDSN Web Services
MIT License
17 stars 15 forks source link

DataSelect service issue #252

Closed junghao closed 1 month ago

junghao commented 2 months ago

@ozym reported an issues

Just noticing some errors turning up with the FDSN service, they are appearing in some of the strong motion processing. They look like:

100 10.2M    0 10.2M    0     0   152k      0 --:--:--  0:01:09 --:--:--     0
curl: (92) HTTP/2 stream 1 was not closed cleanly: PROTOCOL_ERROR (err 1)

in amongst valid data. This query was enough to trigger one:

curl -o test.ms 'https://service.geonet.org.nz/fdsnws/dataselect/1/query?station=WQAB&channel=HN?&endtime=2024-03-16T02:10:56&starttime=2024-03-16T01:56:56&format=miniseed&nodata=204'

But it doesn't seem to happen all the time, this query run again may be okay. The ECS strong build application logs looks like below, the "PROTOCOL_ERROR is coming from the dataservice of FDSN".

June 27, 2024 at 18:37 (UTC+12:00) 2024/06/27 06:37:35 info: event query finished 2024p238521
June 27, 2024 at 18:37 (UTC+12:00) 2024/06/27 06:37:18 info: processing 2024p238521
June 27, 2024 at 18:34 (UTC+12:00) 2024/06/27 06:34:33 info: event query finished 2024p400666
June 27, 2024 at 18:33 (UTC+12:00) 2024/06/27 06:33:01 info: processing 2024p400666
June 27, 2024 at 17:26 (UTC+12:00) 2024/06/27 05:26:08 error: unable to query for event details 2024p202925: stream error: stream ID 149; PROTOCOL_ERROR; received from peer
June 27, 2024 at 17:21 (UTC+12:00) 2024/06/27 05:21:06 info: processing 2024p202925
June 27, 2024 at 17:21 (UTC+12:00) 2024/06/27 05:21:05 error: unable to query for event details 2024p234231: stream error: stream ID 47; PROTOCOL_ERROR; received from peer
June 27, 2024 at 17:18 (UTC+12:00) 2024/06/27 05:18:37 info: processing 2024p234231
June 27, 2024 at 17:18 (UTC+12:00) 2024/06/27 05:18:35 error: unable to query for event details 2024p423234: stream error: stream ID 645; PROTOCOL_ERROR; received from peer
June 27, 2024 at 17:15 (UTC+12:00) 2024/06/27 05:15:30 info: processing 2024p423234
June 27, 2024 at 17:15 (UTC+12:00) 2024/06/27 05:15:28 info: event query finished 2024p445051
June 27, 2024 at 17:13 (UTC+12:00) 2024/06/27 05:13:16 info: processing 2024p445051
June 27, 2024 at 17:13 (UTC+12:00) 2024/06/27 05:13:15 error: unable to query for event details 2024p202925: stream error: stream ID 63; PROTOCOL_ERROR; received from peer
June 27, 2024 at 17:08 (UTC+12:00) 2024/06/27 05:08:18 info: processing 2024p202925
June 27, 2024 at 17:08 (UTC+12:00) 2024/06/27 05:08:15 info: event query finished 2024p444316
June 27, 2024 at 17:06 (UTC+12:00) 2024/06/27 05:06:25 info: processing 2024p444316
June 27, 2024 at 17:06 (UTC+12:00) 2024/06/27 05:06:23 info: event query finished 2024p445051
June 27, 2024 at 17:03 (UTC+12:00) 2024/06/27 05:03:49 info: processing 2024p445051
June 27, 2024 at 17:03 (UTC+12:00) 2024/06/27 05:03:47 info: event query finished 2024p480465
June 27, 2024 at 17:03 (UTC+12:00) 2024/06/27 05:03:45 info: processing 2024p480465
June 27, 2024 at 17:03 (UTC+12:00) 2024/06/27 05:03:43 error: unable to query for event details 2024p234231: stream error: stream ID 209; PROTOCOL_ERROR; received from peer
June 27, 2024 at 17:00 (UTC+12:00) 2024/06/27 05:00:47 info: processing 2024p234231
June 27, 2024 at 17:00 (UTC+12:00) 2024/06/27 05:00:45 error: unable to query for event details 2024p423234: stream error: stream ID 235; PROTOCOL_ERROR; received from peer
June 27, 2024 at 16:57 (UTC+12:00) 2024/06/27 04:57:24 info: processing 2024p423234
June 27, 2024 at 16:57 (UTC+12:00) 2024/06/27 04:57:22 error: unable to query for event details 2024p445051: stream error: stream ID 195; PROTOCOL_ERROR; received from peer
June 27, 2024 at 16:55 (UTC+12:00) 2024/06/27 04:55:48 info: processing 2024p445051
junghao commented 1 month ago

Developer's note:

Test against Fastly and origin:

Fastly:

% curl -o test2.ms "https://service.geonet.org.nz/fdsnws/dataselect/1/query?station=WQAB&channel=HN?&endtime=2024-03-16T02:10:56&starttime=2024-03-16T01:56:56&format=miniseed&nodata=204"
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 10.2M    0 10.2M    0     0   152k      0 --:--:--  0:01:09 --:--:--     0
curl: (92) HTTP/2 stream 1 was not closed cleanly: PROTOCOL_ERROR (err 1)

Origin:

 % curl -o test.ms "http://tf-prod-fdsn-web-2006946694.ap-southeast-2.elb.amazonaws.com/fdsnws/dataselect/1/query?station=WQAB&channel=HN?&endtime=2024-03-16T02:10:56&starttime=2024-03-16T01:56:56&format=miniseed&nodata=204"
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 10.2M    0 10.2M    0     0   104k      0 --:--:--  0:01:40 --:--:--   335

We can see that download from origin takes longer, and I have observed that the last 30 seconds of download against origin is a bit stalled (average speed kept dropping due to no packet received, current speed became 0). However, the download finished after the 20~30seconds stall.

Then check the downloaded file size:

 % ls -l test*.ms
-rw-r--r--  1 howardw  GNS\g_howardw  10785280 31 Jul 10:18 test.ms
-rw-r--r--  1 howardw  GNS\g_howardw  10783950 31 Jul 10:19 test2.ms

The file downloaded via Fastly is 1330 bytes short.

I'm guessing this query returned many files, and many of the files don't contain the data during the time period user requested. Thus the last packet took quite long to scan thru many files, which caused a timeout for Fastly.

Next step:

bpeng commented 1 month ago

Tested on dev with RDS restored from prod snapshot and miniseed files copied from prod bucket:

curl -o test.ms "http://tf-dev-fdsn-web-287969587.ap-southeast-2.elb.amazonaws.com/fdsnws/dataselect/1/query?station=WRAS&channel=HN?&endtime=2024-03-16T02:10:56&starttime=2024-03-16T01:56:56&format=miniseed&nodata=204"

The query is successful with a total time of 1:31.

The requested station has 96 files and each with a size of ~20 MB, the connection timeout on the server is 10 minutes, while that on Fastly is 1 minutes, which resulted in a timeout for the query on the prod service.

Recommendation

Increase the timeout for FDSN Fastly service.

bpeng commented 1 month ago

Applied terraform and tested working