Ingest from Slate-scratch

ri-pandey commented 8 months ago

This is a follow up of ticket #99. The intention is to enable the dataset ingestion from the Slate-Scratch filesystem.

charlesbrandt commented 8 months ago

This feature involves ingesting data from a user-specified path that is already available server-side. Ideally, the specified path can be validated as existing and accessible by the API service.

For IU and HPC systems, only slate-project spaces have had the ability to be shared via Samba. We recently learned that HPFS is expanding availability of the samba gateway to include slate-scratch.

Once access to the data has been confirmed, the data can be processed using the standard ingestion workflow (inspect-bundle-archive-stage-validate). Any user supplied metadata provided by the user at the time of the upload should be associated with the newly created dataset in the Bioloop system.

deepakduggirala commented 7 months ago

Added a couple of APIs to secure download server to show directory listings and size given a path. The code is in operator-data-import branch.

Added service in UI code to call these APIs and a sample page (/datasets/import) to test the functionality.

The size API is not a typical request-response, but a Server Sent Event (SSE) style API. This is made because it can take long to get size of a directory especially on mounted file systems such as Slate Project / Slate Scratch. By making this SSE, http connection will be kept alive till the size is computed.

TODO: No auth is set up yet.

Download routes are moved under /download but the client code is not updated yet. So downloading files will not work.

IUSCA / bioloop

Ingest from Slate-scratch #142