Open JesseFarebro opened 2 years ago
Globus Connect Personal (GCP) does quite a lot more than a simple upload feature would offer -- in particular, it handles firewall hole punching and sends transfers over GridFTP (not HTTPS). In the past, more of the components were open source, but one of the challenges we faced was keeping Globus services interoperable with forked community versions. All that said, we fully recognize that installing GCP for simple upload/download use-cases is a major point of friction for Globus users, so we do intend to provide a solution for this.
This is a feature we have on our roadmap to put together and get working as a part of the globus-cli. We want to provide it through a command like globus upload
or similar -- exact name TBD. It will be, as you suggested, built on the exact same APIs which the Globus webapp uses, but there are some new challenges faced by the globus-cli in terms of the types of usages we want to support. (For a concrete example, the webapp always operates in the context of a known collection ID, but we'd like the CLI to be able to consume an HTTPS upload URL using one of the registered DNS names for a collection. We then need to resolve several pieces of information from that URL, not all of which is trivial.)
We haven't discussed it as an SDK feature. I think you should expect that if it does happen, it will still happen in the CLI first. This is the kind of feature which we'd like to build in a python application at least once before trying to provide a "library-ized" version in the SDK.
I'm going to mark this as a feature request for upload as a capability specifically of the SDK. But if you're equally well satisfied by the idea of using the CLI to do this, let me know and I'll probably close this.
Thanks, @sirosen! I was looking to take advantage of the Flow automations so I'm not sure if a local upload will work. Ideally, I want to "upload" a file to multiple endpoints in the most efficient way possible. I'm not sure if doing this with a Flow would differ from a naive "upload" to each endpoint.
I haven't looked in depth, but I thought the CLI built heavily off the Python SDK. If this is the case I would love to see this feature come to the SDK sooner than later.
Thanks for your quick response / update and I look forward to any progress on this issue.
To add to this (and forgive me if I'm missing something - I'm completely new to Globus) I'm hoping to build a pipeline to run in ephemeral docker container instances which will retrieve data from our Globus endpoint, process it then upload the output. Is my understanding correct that I'd currently need to start up a Personal Connect instance in the container somehow? If so, I think that this feature would add a lot of value to workflows like this.
It's unfortunate that you can't upload files from the local filesystem without installing Personal Connect. Are there any plans in the future to implement this functionality in the Python SDK? It does seem possible to upload files via the web interface, could this API be a path forward to having this functionality? Otherwise, if Personal Connect was open source this could be something the community could take on, but it doesn't seem like it's public at this time.