google-apis-rs / google-cloud-rs

Asynchronous Rust bindings for Google Cloud Platform APIs.
176 stars 48 forks source link

Streaming object insert and get, please #48

Open nlfiedler opened 3 years ago

nlfiedler commented 3 years ago

I can use memmap to effectively stream a large file when calling create_object(), but when calling get() it will always return a Vec<u8> result. I see there are commented out "writer" and "reader" functions, so I'm filing this request just to track the need for this feature. For my use case, I'm always going to be dealing with files that are 64mb or larger, so streaming would be good.

P.S. The google_storage1 crate defines a ReadSeek trait that is used for uploading files. For download, I think they rely on hyper, enabling std::io::copy() directly to a file.

Hirevo commented 3 years ago

Hello !

I agree that the ability to read and write GCS objects in a streaming fashion is definitely valuable.
The one thing that kind of blocked the implementation was that it was unclear what the API for it should be.

I am currently considering the following API:

impl Object {
    // `ObjectReader` would implement `futures_io::AsyncRead`
    pub async fn reader(&mut self) -> Result<ObjectReader, Error> {
        // ...
    }

    // `ObjectWriter` would implement `futures_io::AsyncWrite`
    pub async fn writer(&mut self, mime_type: impl AsRef<str>) -> Result<ObjectWriter, Error> {
        // ...
    }
}

But other crates sometime go for an API that resembles the following:

impl Object {
    // Asynchronously streams the bytes from the GCS object into the provided writer.
    pub async fn streaming_get<W: AsyncWrite>(&mut self, writer: W) -> Result<(), Error> {
        // ...
    }

    // Asynchronously streams the bytes from the provided reader into the GCS object.
    pub async fn streaming_put<R: AsyncRead>(&mut self, mime_type: impl AsRef<str>, reader: R) -> Result<(), Error> {
        // ...
    }
}

I was more inclined to implement the first design rather than the second one because the second one moves the iteration process away from your control and therefore makes it harder to just iterate over the bytes manually, without needing some kind of IO AsyncRead/AsyncWrite in-memory pipe, like the one from sluice.

But I suspect that even the first design might require this kind of in-memory IO pipe to implement the writer method.

Roba1993 commented 3 years ago

Hi !

I try to implement the storage API right now into the following project of mine https://github.com/Roba1993/stow Maybe you can get some idea on how to solve it there. I went with a AsyncRead for both file get and put, which works quiete nicely.

abonander commented 2 years ago

We use rusoto_s3 for pushing to Cloud Storage using the S3-compatible API. It works pretty well and supports streaming bodies. Here's a decent example from their integration tests: https://github.com/rusoto/rusoto/blob/master/integration_tests/tests/s3.rs#L865