Open jhgg opened 3 years ago
@jhgg hey, this would definitely be a welcome addition! Have you got any ideas on what a good API might look like?
So, essentially what is needed is, we need to be able to say "upload this thing to this bucket with this object name", I think for the purposes of resumable upload, we'll just need a T: std::io::Read + std::io::Seek
, this will allow us to support file objects right from disk, but also Vec<u8>'
s as wrapped by std::io::Cursor
.
Alternatively, if we want to go the "full async" route, which may be preferable, we can use tokio's provided AsyncRead
and AsyncSeek
traits instead (probably the better idea.)
So some API may look like:
let file = tokio::fs::File::open("foo.txt").await?;
Object::create_with_resumable_upload("mybucket", file, "folder/foo.txt", "application/text").await?;
Of course, it'd be also cool if we could provide some options to this. However, I'm not sure what options yet to expose. Ideally the default implementation create_with_resumable_upload
has sensible defaults (probably following what the official gcs libraries do). And a future version as informed by users of the library might pave way for a more verbose API with knobs to tweak if necessary.
Some things we might be able to expose with a more verbose config API
About the choice between AsyncRead
and io::Read
: Currently the library is usable without an async runtime by using the ***_sync
methods, which are enabled by the sync
feature flag. I think that the preference should therefore go to AsyncRead
which create_resumable
, but there should be some sore of glue method like create_resumable_sync
that spins up its own small executor.
I think postponing the config API is sensible since it can be added without breaking changes, it makes the initial design simpler and the complicated design more informed.
Hmm. I think it'd make sense to start with the async API first. The sync api requires a bit more consideration, as the safe conversion of an io::Read
to an tokio::AsyncRead
involves some additional adapters to make them safe to use in an async context.
For example, for all intents and purposes, io::Read
on a Cursor
that wraps a Vec<u8>
is safe to naively convert to an AsyncRead
(and actually is in tokio: https://docs.rs/tokio/1.0.2/tokio/io/trait.AsyncRead.html#impl-AsyncRead-for-Cursor%3CT%3E). However, io::Read
on an std::fs::File
would not be safe, because the file system operation could block the executor. So, we'd have to write an adapter that takes a thing that implements Read/Seek and then exposes AsyncRead/AsyncSeek, by basically executing all of the above operations in a thread pool (tokio::spawn_blocking, etc...).
On a side note: why do you think it is required to take both Read
and Seek
, wouldn't Read
suffice?
for uploading large files to GCS, it's recommended to use resumable uploads.
If there's interest I'd be down to contribute this upstream to this crate.
More details about how to do it here: https://cloud.google.com/storage/docs/performing-resumable-uploads