IUSCA / bioloop

Scientific data management portal and pipeline application template
Other
5 stars 2 forks source link

Implement a download API within Bioloop core API, make usage of secure_download configurable #323

Open ri-pandey opened 1 month ago

ri-pandey commented 1 month ago

Currently, secure_download API routes are deployed in a separate docker container, which is not part of the application's docker containers (ui, api, postgres, etc.). Hence, to make API requests to secure_download, the Slate-scratch filesystem has to be mounted to the secure_download docker container, and we have to add an additional Bearer token before the API request to secure_download can be made.

Now that Slate-Scratch is available on the Bioloop service host via samba mounts, this architecture can be simplified. We can use the mounted filesystem to make upload/download requests to the core Bioloop API directly.

We will still want to retain the download API within secure_download so open source consumers of Bioloop can leverage that federated architecture.

ri-pandey commented 4 weeks ago

There are two potential directions this can go in:

  1. making the /download/:id HTTP endpoint part of the core Bioloop API.
  2. retaining the /download/:id HTTP endpoint in the secure_download container, but hosting the container on a machine that we have sudo access to, instead of colo25, where most developers don't.

In either option, scalability may become a problem. The machine that the Bioloop service APIs are currently hosted on currently offer 2 threads for processing.

In either case, we do want to retain secure_download code in the repo, so that the federated download architecture can be leveraged by open source consumers of Bioloop.

ri-pandey commented 4 weeks ago

There is some work-in-progress for this on branch feature-323. I copied the /download/:id endpoint to Bioloop's /datasets route, and mounted scadev's slate-scratch space to the Bioloop API container, so it can read and serve files from slate-scratch. Currently, this config downloads a static asset instead of the expected file.

ri-pandey commented 4 weeks ago

Another consideration to keep in mind would be the upload API, which is also hosted within the secure_download container.

From my efforts so far, it seems unlikely that we will be able to implement the upload API in the secure_download container. The reason for this is that the slate-scratch filesystem is mounted onto the Bioloop service host via a SMB mount. To enable uploading files via the Bioloop API instead of the secure_download API, this SMB-mounted filesystem will need to be further mounted into the Bioloop API docker container via a bind mount. While this works for reading data from slate-scratch, writing data to slate-scratch is not performant enough via this strategy. So, the uploads will be awfully slow if we take this route.

In either case, both the download and upload APIs should continue to be a part of secure_download for the benefit of open source consumers of Bioloop.

The integration between Bioloop API and secure_download API is explained here: https://github.com/IUSCA/bioloop/blob/99-dataset-upload-2/docs/upload.md?plain=1#L64

Note that the upload API is not a part of the Production secure_download at this point.