dandi / dandiarchive-legacy

Code for the DANDI Web app
https://dandiarchive.org
Apache License 2.0
9 stars 9 forks source link

API (via girder ATM) to obtain URL to a file #54

Closed yarikoptic closed 4 years ago

yarikoptic commented 5 years ago

Our main asset store is on S3. Local asset store could be on a file system. To pass a file into external tool (notebook or datalad) we need to discover its location. In case of private files (not current target, but worth keeping them in mind) I guess dandi archive should mint some short lived URL, which in the case of S3 could be done for us by S3 itself. In case of public files, it could be direct url in the bucket (ideally with versionId) or URI to the asset store + path to the file in the assetstore + ideally versionId from a versioned bucket (but that is s3 specific). So probably we just need a simple API to "get_uri_for_file"

Could that be easily done or may be already available within girder-client @mgrauer ?

satra commented 5 years ago

for any file id, this API provides the URL:

https://girder.dandiarchive.org/api/v1/file/{id}/download

https://girder.dandiarchive.org/api/v1/file/5dab084bf377535c7d96c2c4/download?contentDisposition=attachment

satra commented 5 years ago

however, this does not provide the direct url to the s3 bucket

@mgrauer - does this mean we are paying for egress from the dandi archive if we call this API?

in that case, could we add the url as metadata to the item that contains the file?

yarikoptic commented 5 years ago

That is also a decision to make - either we would like to be this middle service (Which is great for telemetry, possible resilience, etc; but also the culprit) or for public files provide the end point URL (directly to s3) -- that would remove us as a middle man. I wondered if it may be redirects, but it seems to be not public (requires authentication):

$> wget -S 'https://girder.dandiarchive.org/api/v1/file/5dab084bf377535c7d96c2c4/download?contentDisposition=attachment'
--2019-11-02 10:13:53--  https://girder.dandiarchive.org/api/v1/file/5dab084bf377535c7d96c2c4/download?contentDisposition=attachment
Resolving girder.dandiarchive.org (girder.dandiarchive.org)... 3.19.164.171
Connecting to girder.dandiarchive.org (girder.dandiarchive.org)|3.19.164.171|:443... connected.
HTTP request sent, awaiting response... 
  HTTP/1.1 401 Unauthorized
  Server: nginx/1.14.0 (Ubuntu)
  Date: Sat, 02 Nov 2019 14:13:54 GMT
  Content-Type: application/json
  Content-Length: 100
  Connection: keep-alive
  Allow: DELETE, GET, HEAD, OPTIONS, PATCH, POST, PUT
  Girder-Request-Uid: 99cf33a4-5574-43d3-b457-6d23d1da0995

Username/Password Authentication Failed.
satra commented 5 years ago

Username/Password Authentication Failed.

yarik: all uploaded data are private at the moment. i don't know if that was a conscious decision or just by default. you could turn yours public to test if it is a redirect.

but yes, it would be good to be a redirect and provide the url (if available) via an api.

yarikoptic commented 5 years ago

Hm, I will check on dandi client side but I thought I stated that it should be public... Checked and see no public anywhere in https://github.com/dandi/dandi-cli/blob/master/dandi/cli/command.py so I guess it was up to default which is private. I will fix for that https://github.com/dandi/dandi-cli/issues/31

mgrauer commented 4 years ago

This is resolved. The Girder and publish API provide redirects to S3 from File and Asset download endpoints.