dandi / dandidav

WebDAV view to DANDI Archive
MIT License
4 stars 2 forks source link

can cause 500 while wget'ing #130

Closed yarikoptic closed 4 months ago

yarikoptic commented 4 months ago

@TheChymera wanted "simpler" way to download danidisets. So I tried to provide ultimate wget invocation against webdav instance but got into 500:

❯ wget -r -nH --cut-dirs=3 --no-parent --reject "index.html*" https://webdav.dandiarchive.org/dandisets/000027/releases/0.210831.2033/
--2024-05-13 10:55:28--  https://webdav.dandiarchive.org/dandisets/000027/releases/0.210831.2033/
Resolving webdav.dandiarchive.org (webdav.dandiarchive.org)... 18.205.36.100, 54.162.128.250, 54.157.58.70, ...
Connecting to webdav.dandiarchive.org (webdav.dandiarchive.org)|18.205.36.100|:443... connected.
HTTP request sent, awaiting response... 500 Internal Server Error
2024-05-13 10:55:28 ERROR 500: Internal Server Error.

on server side logs -- seems a query to dandi API failed

(base) dandi@drogon:/mnt/backup/dandi/heroku-logs/dandidav$ grep -B10 '500 Internal Server' 20240513-1001.log
2024-05-13T14:55:28.791415+00:00 app[web.1]: 2024-05-13T14:55:28.786845899Z DEBUG request{method=GET uri=/dandisets/000027/releases/0.210831.2033/ version=HTTP/1.1}:outgoing-request{url=https://api.dandiarchive.org/api/dandisets/000027/versions/0.210831.2033/info/ method=GET}: dandidav::httputil: Failed to receive response error=Reqwest(reqwest::Error { kind: Request, url: Url { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("api.dandiarchive.org")), port: None, path: "/api/dandisets/000027/versions/0.210831.2033/info/", query: None, fragment: None }, source: hyper::Error(IncompleteMessage) })
2024-05-13T14:55:28.792608+00:00 app[web.1]: 2024-05-13T14:55:28.791435205Z ERROR request{method=GET uri=/dandisets/000027/releases/0.210831.2033/ version=HTTP/1.1}: dandidav::dav: Internal server error error=failed to fetch data from Archive
2024-05-13T14:55:28.792609+00:00 app[web.1]:
2024-05-13T14:55:28.792610+00:00 app[web.1]: Caused by:
2024-05-13T14:55:28.792610+00:00 app[web.1]: 0: failed to make request to https://api.dandiarchive.org/api/dandisets/000027/versions/0.210831.2033/info/
2024-05-13T14:55:28.792611+00:00 app[web.1]: 1: Request error: error sending request for url (https://api.dandiarchive.org/api/dandisets/000027/versions/0.210831.2033/info/): connection closed before message completed
2024-05-13T14:55:28.792612+00:00 app[web.1]: 2: error sending request for url (https://api.dandiarchive.org/api/dandisets/000027/versions/0.210831.2033/info/): connection closed before message completed
2024-05-13T14:55:28.792612+00:00 app[web.1]: 3: connection closed before message completed
2024-05-13T14:55:28.799348+00:00 app[web.1]: 2024-05-13T14:55:28.799314862Z  INFO request{method=GET uri=/dandisets/000027/releases/0.210831.2033/ version=HTTP/1.1}: dandidav: Current memory usage: 441126912 physical, 984641536 virtual
2024-05-13T14:55:28.799369+00:00 app[web.1]: 2024-05-13T14:55:28.799350419Z DEBUG request{method=GET uri=/dandisets/000027/releases/0.210831.2033/ version=HTTP/1.1}: tower_http::trace::on_response: finished processing request latency=20 ms status=500
2024-05-13T14:55:28.799371+00:00 app[web.1]: 2024-05-13T14:55:28.79936593Z ERROR request{method=GET uri=/dandisets/000027/releases/0.210831.2033/ version=HTTP/1.1}: tower_http::trace::on_failure: response failed classification=Status code: 500 Internal Server Error latency=20 ms

and indeed rerunning that command worked just fine. So we might want to robustify interaction with DANDI APIU

jwodder commented 4 months ago

@yarikoptic FYI, there's no need to include --reject "index.html*" in the wget command, as dandidav does not serve any index.html files (aside from any that might exist inside a Dandiset).

So we might want to robustify interaction with DANDI APIU

Robustify how? If you mean retrying until the API request succeeds, I disagree, as that would slow down responses; if the end-user is OK with such slowdown, they should be the ones doing the retrying when making requests to dandidav.

yarikoptic commented 4 months ago

So we might want to robustify interaction with DANDI APIU

Robustify how? If you mean retrying until the API request succeeds, I disagree, as that would slow down responses; if the end-user is OK with such slowdown, they should be the ones doing the retrying when making requests to dandidav.

We should retry at least upon 5xx for a few times. IMHO it is better to cause a slow down but likely to address the unstable connection/operation: dandidav here is a "client" to the dandi-api. Just propagating 500s we are adding to their possibility to occur in the long(er) chain of such services happen something else is building atop of dandidav, unless dandidav takes effort to mitigate.

Interestingly, apparently wget claims to also retry on such (?) cases, since

       -t number
       --tries=number
           Set number of tries to number. Specify 0 or inf for infinite retrying.  The default is to retry 20
           times, with the exception of fatal errors like "connection refused" or "not  found"  (404),  which
           are not retried.

if I understand "default is to retry" here, but I don't remember if it did retry to us here.

jwodder commented 4 months ago

@yarikoptic

yarikoptic commented 4 months ago

I would expect it to be intermittent so could be 4 retries with 2**(retry-1) seconds in between (0, 2, 4, 8)

yarikoptic commented 4 months ago

for 5xx -- let's do for any