galaxyproject / galaxy-helm

Minimal setup required to run Galaxy under Kubernetes
MIT License
41 stars 38 forks source link

Added initial support for tusd #341

Closed nuwang closed 2 years ago

nuwang commented 2 years ago

This builds on @mvdbeek's fantastic work on integrating tus with Galaxy: https://github.com/galaxyproject/galaxy/pull/12656, by addingthe tusd daemon as an out-of-the-box component in the helm chart.

Things are at a point where tusd is receiving the requests, but something is slightly amiss with auth handling:

[tusd] 2021/10/12 14:06:23 Using '/galaxy/server/database/tmp' as directory storage.
[tusd] 2021/10/12 14:06:23 Using 0.00MB as maximum size.
[tusd] 2021/10/12 14:06:23 Using 'http://gxyarchivetest6-galaxy-nginx:8000/galaxy/api/upload/hooks' as the endpoint for hooks
[tusd] 2021/10/12 14:06:23 Enabled hook events: pre-create, post-create, post-receive, post-terminate, post-finish
[tusd] 2021/10/12 14:06:23 Using 0.0.0.0:1080 as address to listen.
[tusd] 2021/10/12 14:06:23 Using /files/ as the base path.
[tusd] 2021/10/12 14:06:23 Using /metrics as the metrics path.
[tusd] 2021/10/12 14:06:23 Supported tus extensions: creation,creation-with-upload,termination,concatenation,creation-defer-length
[tusd] 2021/10/12 14:06:23 You can now upload files to: http://0.0.0.0:1080/files/
[tusd] 2021/10/12 14:11:34 event="RequestIncoming" method="POST" path="" requestId="76a08832e52cef184ed906db1d5c4510"
[tusd] 2021/10/12 14:11:34 event="HookInvocationStart" type="pre-create" id=""
[tusd] 2021/10/12 14:11:34 event="ResponseOutgoing" status="403" method="POST" path="" error="pre-create hook failed: endpoint returned: 403 Forbidden" requestId="76a08832e52cef184ed906db1d5c4510"

session cookie not getting forwarded perhaps? Ping @mvdbeek

mvdbeek commented 2 years ago

Ah! I have been using -hooks-http-forward-headers=X-Api-Key,Cookie ... I will update the docs!

nuwang commented 2 years ago

@mvdbeek The Cookie header solved the auth issue, thanks. But there's a small issue remaining, with Galaxy complaining that:

image

Uploaded temporary file (/galaxy/server/database/tmp/resumable_upload7b8816eaa9268e559a64702b63173feb) does not exist.

The content of the upload dir does not have resumable_upload prefixed in front.


/srv/tusd-data $ ls -alh /galaxy/server/database/tmp
total 140
drwxr-xr--    2 101      101         4.0K Oct 12 14:41 .
drwxrwxrwx   13 root     root        4.0K Oct 12 14:41 ..
-rw-r--r--    1 101      101       124.2K Oct 12 14:40 7b8816eaa9268e559a64702b63173feb
-rw-r--r--    1 101      101          570 Oct 12 14:40 7b8816eaa9268e559a64702b63173feb.info
``
mvdbeek commented 2 years ago

Another doc problem ... sorry!

nuwang commented 2 years ago

@mvdbeek Works great now, thanks! One thing I noticed was that uploads that had failed prior to these changes can still not be uploaded. But files that are successful can be. image

On the k8s side, we are still double proxied, so I may take a crack at exposing tusd directly over k8s ingress, instead of making it double proxied through the internal galaxy nginx.

Also, this only works for authenticated users right? Anonymous users still go through the previous mechanism?

mvdbeek commented 2 years ago

Works great now, thanks! One thing I noticed was that uploads that had failed prior to these changes can still not be uploaded.

Did you click on reset in the upload box ? It did work for me locally, but you do have to reset the upload box

Also, this only works for authenticated users right? Anonymous users still go through the previous mechanism?

No, it also works for anonymous (i.e not logged in) users via the session cookie. You just can't send upload requests when you haven't gotten a session cookie (or an API key) first. If you want to open uploads up completely without any checks you could skip the -hooks-http flag, whose only purpose is making sure we know that we're actually dealing with a legitimate Galaxy user (logged in or not)

nuwang commented 2 years ago

For whatever reason, it doesn't work if I log out, it's using the legacy upload: image

If I log back in, I still can't upload that failed file. This is after a hard refresh.

nuwang commented 2 years ago

@mvdbeek This is the log after logging in: image

mvdbeek commented 2 years ago

Hmm, in both cases you have a sessioncookie, so that's all the formal requirements. But I see we do some additional logic that might disable chunked uploads. https://github.com/galaxyproject/galaxy/pull/12656/commits/5f926d966e1a84f9bca9cae3c962db3c42b90238 might fix this (I haven't tested it yet)

nuwang commented 2 years ago

@mvdbeek Some additional data about the failure of previously failed files.

nuwang commented 2 years ago

Another question, if anonymous access is disabled, but a malicious user nevertheless directly accesses the tusd resumable_upload endpoint, does the http hook get triggered at that point and auth validated? Is the hook endpoint https://github.com/galaxyproject/galaxy/pull/12656/files#diff-90f10fef62ca598fe3d9ff6f19cac742cb3a9f2e4cd418584e2c3d28ed5993aeR29 still to be fleshed out?

mvdbeek commented 2 years ago

Another question, if anonymous access is disabled, but a malicious user nevertheless directly accesses the tusd resumable_upload endpoint, does the http hook get triggered at that point and auth validated?

Yes, it's tusd that controls the hook. So even if you say break out from an IT and manage to get access to the internal tusd url the hook will run (and fail if you don't provide a valid API key / session key).

Is the hook endpoint https://github.com/galaxyproject/galaxy/pull/12656/files#diff-90f10fef62ca598fe3d9ff6f19cac742cb3a9f2e4cd418584e2c3d28ed5993aeR29 still to be fleshed out?

Not for the current functionality. The only purpose of the endpoint is checking that either a valid api key or session id is in use. This is more of a protection against DOS attacks / someone spamming a Galaxy server with many small uploads than anything else.

If the hook endpoint returns 403, as it would for an unauthorized user, tusd will not accept the requests. We could even use something like the existing /api/users/current_user, but I think it's better to reserve this endpoint for future enhancements. You can check the behavior by doing curl <galaxy_url>/api/upload/hooks and see it returns 403. Or you can use tusc (https://github.com/jackhftang/tusc) and compare upload with and without a header (so tusc client <galaxy_url>/api/upload/resumable_upload some_file-H "x-api-key: $API_KEY" or tusc client <galaxy_url>/api/upload/resumable_upload some_file)

mvdbeek commented 2 years ago
  • As shown in the last screen show above, all of the HEAD requests return 200, before finally logging this: Failed because: Error: tus: invalid or missing offset value, originated from request (method: HEAD, url: http://localhost/galaxy/api/upload/resumable_uploadf64e9c47874b96195d41b1f4d983b663, response code: 200, response text: , request id: n/a) It also seems to be issuing the same HEAD request over and over again.

Yeah, I also noticed this just now. It seems to be quite selective about the file in question, which is weird. That is probably a client or nginx problem.

mvdbeek commented 2 years ago

Yeah, I also noticed this just now. It seems to be quite selective about the file in question, which is weird. That is probably a client or nginx problem.

Yep, a missing slash in the client 😆 ... and this was only a problem for the HEAD requests that check if an upload had already been created.