Open ri-pandey opened 1 month ago
There are two potential directions this can go in:
In either option, scalability may become a problem. The machine that the Bioloop service APIs are currently hosted on currently offer 2 threads for processing.
In either case, we do want to retain secure_download code in the repo, so that the federated download architecture can be leveraged by open source consumers of Bioloop.
There is some work-in-progress for this on branch feature-323. I copied the /download/:id endpoint to Bioloop's /datasets
route, and mounted scadev's slate-scratch space to the Bioloop API container, so it can read and serve files from slate-scratch. Currently, this config downloads a static asset instead of the expected file.
Another consideration to keep in mind would be the upload API, which is also hosted within the secure_download container.
From my efforts so far, it seems unlikely that we will be able to implement the upload API in the secure_download container. The reason for this is that the slate-scratch filesystem is mounted onto the Bioloop service host via a SMB mount. To enable uploading files via the Bioloop API instead of the secure_download API, this SMB-mounted filesystem will need to be further mounted into the Bioloop API docker container via a bind mount. While this works for reading data from slate-scratch, writing data to slate-scratch is not performant enough via this strategy. So, the uploads will be awfully slow if we take this route.
In either case, both the download and upload APIs should continue to be a part of secure_download for the benefit of open source consumers of Bioloop.
The integration between Bioloop API and secure_download API is explained here: https://github.com/IUSCA/bioloop/blob/99-dataset-upload-2/docs/upload.md?plain=1#L64
Note that the upload API is not a part of the Production secure_download at this point.
Currently, secure_download API routes are deployed in a separate docker container, which is not part of the application's docker containers (ui, api, postgres, etc.). Hence, to make API requests to secure_download, the Slate-scratch filesystem has to be mounted to the secure_download docker container, and we have to add an additional Bearer token before the API request to secure_download can be made.
Now that Slate-Scratch is available on the Bioloop service host via samba mounts, this architecture can be simplified. We can use the mounted filesystem to make upload/download requests to the core Bioloop API directly.
We will still want to retain the download API within secure_download so open source consumers of Bioloop can leverage that federated architecture.