ThouCheese / cloud-storage-rs

A crate for uploading files to Google cloud storage, and for generating download urls.
MIT License
123 stars 88 forks source link

resumable upload support #54

Open jhgg opened 3 years ago

jhgg commented 3 years ago

for uploading large files to GCS, it's recommended to use resumable uploads.

If there's interest I'd be down to contribute this upstream to this crate.

More details about how to do it here: https://cloud.google.com/storage/docs/performing-resumable-uploads

ThouCheese commented 3 years ago

@jhgg hey, this would definitely be a welcome addition! Have you got any ideas on what a good API might look like?

jhgg commented 3 years ago

So, essentially what is needed is, we need to be able to say "upload this thing to this bucket with this object name", I think for the purposes of resumable upload, we'll just need a T: std::io::Read + std::io::Seek, this will allow us to support file objects right from disk, but also Vec<u8>'s as wrapped by std::io::Cursor.

Alternatively, if we want to go the "full async" route, which may be preferable, we can use tokio's provided AsyncRead and AsyncSeek traits instead (probably the better idea.)

So some API may look like:

let file = tokio::fs::File::open("foo.txt").await?;
Object::create_with_resumable_upload("mybucket", file, "folder/foo.txt", "application/text").await?;

Of course, it'd be also cool if we could provide some options to this. However, I'm not sure what options yet to expose. Ideally the default implementation create_with_resumable_upload has sensible defaults (probably following what the official gcs libraries do). And a future version as informed by users of the library might pave way for a more verbose API with knobs to tweak if necessary.

jhgg commented 3 years ago

Some things we might be able to expose with a more verbose config API

ThouCheese commented 3 years ago

About the choice between AsyncRead and io::Read: Currently the library is usable without an async runtime by using the ***_sync methods, which are enabled by the sync feature flag. I think that the preference should therefore go to AsyncRead which create_resumable, but there should be some sore of glue method like create_resumable_sync that spins up its own small executor.

I think postponing the config API is sensible since it can be added without breaking changes, it makes the initial design simpler and the complicated design more informed.

jhgg commented 3 years ago

Hmm. I think it'd make sense to start with the async API first. The sync api requires a bit more consideration, as the safe conversion of an io::Read to an tokio::AsyncRead involves some additional adapters to make them safe to use in an async context.

For example, for all intents and purposes, io::Read on a Cursor that wraps a Vec<u8> is safe to naively convert to an AsyncRead (and actually is in tokio: https://docs.rs/tokio/1.0.2/tokio/io/trait.AsyncRead.html#impl-AsyncRead-for-Cursor%3CT%3E). However, io::Read on an std::fs::File would not be safe, because the file system operation could block the executor. So, we'd have to write an adapter that takes a thing that implements Read/Seek and then exposes AsyncRead/AsyncSeek, by basically executing all of the above operations in a thread pool (tokio::spawn_blocking, etc...).

ThouCheese commented 3 years ago

On a side note: why do you think it is required to take both Read and Seek, wouldn't Read suffice?