Open charlesbrandt opened 1 year ago
Finally, Globus offers a lot of flexibility in transfers, like checking completion percentage, status, overriding checksum verification, resuming failed transfers, etc. These features are currently available in the Globus UI, so it doesn't make sense to replicate these features in Bioloop.
Upon further discussion and review, we would like to minimize the amount of duplication of effort to enable data delivery from Bioloop through the Globus network.
Bioloop operators should have a way to specify which Globus users are allowed to read data from a specific Bioloop project. We expect this will be a one-way operation. Data can be read from Bioloop and and delivered via Globus. We do not expect to receive data via Globus and write/ingest to Bioloop.
In Bioloop, once a project has been configured for sharing via Globus, all other operations should be handled by Globus. To that end, we anticipate needing to configure a Globus Connect Server to handle data delivery to other Globus endpoints:
https://www.globus.org/globus-connect-server
I believe this is what other services like SDA or Slate are running to facilitate working with those resources through Globus. This will likely require a subscription to run the globus connect server:
https://www.globus.org/subscriptions
We would like to confirm that this is a viable path for delivering data managed in an instance of Bioloop via the Globus network.
We would also like to understand what needs to happen for the globus connect server to know what Bioloop projects are available for sharing, and which users should be granted read access.
We will need to learn which subscription level is appropriate for this use case.
@charlesbrandt There are APIs which should make the above possible.
'Sharing' in Globus (i.e. sharing with a user as opposed to transferring data to an endpoint) takes place via Guest Collections. Guest Collections can be created by the user desiring to share their data. An existing Guest Collection may also be used. Once a user or group of users have been grated read access to the Guest Collection, they should be able to access the data in their Globus web instance.
Here's the flow I am envisioning within Bioloop:
The steps I listed above are achievable through the Globus API.
This seems to be a viable path.
Note - Currently, the option to create a Guest Collection is disabled for IU's Globus instance. This may be admin-level setting that would need to be disabled by IU's Globus admins. I also believe we may already be on the subscription needed to take advantage of the Share feature - but the option to Create Guest Collections would need to be enabled before we can make use of the Share feature.
Globus allows data transfer between local storage targets and targets outside of the university. Instead of downloading data directly to a desktop client via a browser, or providing a path on local storage where the data has already been staged, this feature would allow data transfer using Globus. Some questions to explore first:
https://www.globus.org/
Research data management simplified. | globus
https://www.globus.org/platform/services/flows
Globus Flows | globus
https://kb.iu.edu/d/bdqp
Use the IU Globus Web App to transfer data between your accounts on IU's research computing and storage systems