Open charlesbrandt opened 1 year ago
Here in NGVB is the UI where we start the upload:
https://github.com/IUSCA/NGVB/blob/main/ui/src/components/forms/UploadStorageData.vue
That handles chunking the file and sending each piece to the backend to store the file chunks in a folder. To here:
https://github.com/IUSCA/NGVB/blob/main/api/routes/data.js#L23
And finally calls the merge endpoint which actually now calls a worker to finish up the merge (which we needed for larger files):
https://github.com/IUSCA/NGVB/blob/main/api/routes/data.js#L43
The actual file merge is handled in a python worker script you can find here:
After talking to Ryan and Deepak it seems reasonable to go with the overall strategy that Ryan implemented in NGVB, with some tweaks.
We could have an endpoint created on the Colo node to upload the file chunks, from where a worker could merge the chunks into a file, create the data product, and run subsequent workflow steps.
@ri-pandey You may also need to consider
For authentication, while making an upload request from the browser, send the JWT issued by app API (when the user logs in). The upload API needs to verify the signature using app api's public key. For this to work, a copy of the public key has to be made and placed in the repo of upload API.
Another approach is to generate a new token using Signet oauth service similar to how secure download works. Code for validating the token is in the secure download API. The user role needs to be in this token for authorization.
Needs testing in a remote test environment.
Occasionally data products are generated outside of a given instance of Bioloop. In this situation, operators need a way to manually import or ingest the data product.
In CMG, we call this process uploading, however it is more similar to ingesting from a custom path:
If the operator derived the data product from an existing dataset (raw data source) that already exists in the data portal, there should be a way to associate the data product being uploaded with the source dataset.
@ryanlong89 has also implemented related functionality in the context of NGVB that successfully manages transferring large data from the operator's local machine to the server. @ryanlong89 , could you add a pointer to that functionality here?
@ri-pandey , once you've had a chance to review existing solutions, please document and describe the approach you plan to take to implement the feature here in this issue (before starting actual development).
FYI @deepakduggirala