dandi / redirector

Apache License 2.0
0 stars 2 forks source link

Allow for HEAD requests #13

Closed yarikoptic closed 4 years ago

yarikoptic commented 4 years ago

ATM this redirector spits out 405 HTTP error upon attempt to HEAD it to check for redirection without downloading full content of an item.

little script to test possible candidates and checking for redirection with HEAD using httlib2 and requests ```shell #!/usr/bin/python3 import requests import httplib2 h = httplib2.Http() from pprint import pprint for url in ( 'https://gui.dandiarchive.org/#/file-browser/folder/5e72b6ac3da50caa9adb0498', 'https://identifiers.org/DANDI:000009', 'https://dandiarchive.org/dandiset/000009', # does not allow HEAD 'https://bit.ly/2X65u0P', 'http://datasets.datalad.org/labs/gobbini/famface/data/.git/annex/objects/0P/02/SHA256E-s5676255--d368873b04ef288ed70be8b7df633a556f82c320176ac6af17e0efcf5db31c1b.nii.gz/SHA256E-s5676255--d368873b04ef288ed70be8b7df633a556f82c320176ac6af17e0efcf5db31c1b.nii.gz', ): print('\n' + url[:50]) resp, content= h.request(url, 'HEAD') print('httplib2:', len(content), resp['status'], resp.get('content-location', None)) resp = requests.head(url, allow_redirects=True) print('requests:', len(resp.content), resp.status_code, resp.history, resp.url) ```

which produces

$> python3 /tmp/test-redir

https://gui.dandiarchive.org/#/file-browser/folder
httplib2: 0 200 https://gui.dandiarchive.org/#/file-browser/folder/5e72b6ac3da50caa9adb0498
requests: 0 200 [] https://gui.dandiarchive.org/#/file-browser/folder/5e72b6ac3da50caa9adb0498

https://identifiers.org/DANDI:000009
httplib2: 1063 200 https://gui.dandiarchive.org/#/dandiset/5e72840a3da50caa9adb0489
requests: 0 405 [<Response [302]>] https://dandiarchive.org/dandiset/000009

https://dandiarchive.org/dandiset/000009
httplib2: 0 405 None
requests: 0 405 [] https://dandiarchive.org/dandiset/000009

https://bit.ly/2X65u0P
httplib2: 0 200 https://stackoverflow.com/questions/16778435/python-check-if-website-exists
requests: 0 200 [<Response [301]>] https://stackoverflow.com/questions/16778435/python-check-if-website-exists

http://datasets.datalad.org/labs/gobbini/famface/d
httplib2: 0 200 http://datasets.datalad.org/labs/gobbini/famface/data/.git/annex/objects/0P/02/SHA256E-s5676255--d368873b04ef288ed70be8b7df633a556f82c320176ac6af17e0efcf5db31c1b.nii.gz/SHA256E-s5676255--d368873b04ef288ed70be8b7df633a556f82c320176ac6af17e0efcf5db31c1b.nii.gz
requests: 0 200 [] http://datasets.datalad.org/labs/gobbini/famface/data/.git/annex/objects/0P/02/SHA256E-s5676255--d368873b04ef288ed70be8b7df633a556f82c320176ac6af17e0efcf5db31c1b.nii.gz/SHA256E-s5676255--d368873b04ef288ed70be8b7df633a556f82c320176ac6af17e0efcf5db31c1b.nii.gz

that is necessary to make dandi-cli to issue .get request on an arbitrary url (see https://github.com/dandi/dandi-cli/issues/99) and IMHO would be the correct thing to support

satra commented 4 years ago

what would you want the HEAD response to be?

yarikoptic commented 4 years ago

I have no specifics knowledge besides that I want it to conform to established "standards" for the redirection responses for HEAD requests ;) Please check what other redirectors spit out (e.g. using that script above). I am just checking resultant end (after possibly multiple redirections) url and status code

satra commented 4 years ago

it's up to us (no general standard for response) and many sites don't support HEAD. i'm assuming you are specifically looking at head support for our permalinks, not for all the redirector endpoints.

as you can see above identifiers.org does some magic behind the scenes to the httplib2 head request. but seeing the content returned i think it's just sending you to the gui site.

perhaps the better question is what do you want to use HEAD for? because if we enable HEAD for the permalink we are making the return part of the API. so decide now and keep "forever"

yarikoptic commented 4 years ago

I guess mimicing what bit.ly or identifiers.org do is a good bet... actually bit.ly responds with 301 (moved permanently) and identifiers with 302 (temporarily). I think since things could change, 302 would be a safer bet:

$> curl -i -X HEAD  https://bit.ly/2X65u0P
Warning: Setting custom HTTP method to HEAD with -X/--request may not work the 
Warning: way you want. Consider using -I/--head instead.
HTTP/2 301 
server: nginx
date: Wed, 27 May 2020 22:11:01 GMT
content-type: text/html; charset=utf-8
content-length: 162
cache-control: private, max-age=90
content-security-policy: referrer always;
location: https://stackoverflow.com/questions/16778435/python-check-if-website-exists
referrer-policy: unsafe-url
via: 1.1 google
alt-svc: clear

curl: (18) transfer closed with 162 bytes remaining to read

$> curl -i -X HEAD https://identifiers.org/DANDI:000009                      
Warning: Setting custom HTTP method to HEAD with -X/--request may not work the 
Warning: way you want. Consider using -I/--head instead.
HTTP/2 302 
location: https://dandiarchive.org/dandiset/000009
content-type: text/plain;charset=UTF-8
content-length: 0
date: Wed, 27 May 2020 22:11:32 GMT
via: 1.1 google
alt-svc: clear

so I would just mimic identifiers in its entirety (0 length content)

yarikoptic commented 4 years ago

perhaps the better question is what do you want to use HEAD for?

to properly address your issue in dandi-cli I referenced above ;)

because if we enable HEAD for the permalink we are making the return part of the API

sorry, I am not following how issuing a redirect for HEAD is critically different from issuing it for GET?

satra commented 4 years ago

redirector is live try it