datafusion-contrib / datafusion-objectstore-s3

S3 as an ObjectStore for DataFusion
Apache License 2.0
59 stars 13 forks source link

Implement write and delete for S3FileSystem #54

Closed wjones127 closed 2 years ago

wjones127 commented 2 years ago

As part of prototyping apache/arrow-datafusion#2246 I am implementing the API for S3 to see what it's like to implement these operations for object stores. (So far, I've learned that AsyncWrite is quite difficult to implement 😢.)

matthewmturner commented 2 years ago

Very happy to see this - i had it on my to-do list as well. What issues have you run into with AsyncWrite?

wjones127 commented 2 years ago

Basically took me a bit to figure out how to create this state enum:

enum S3MultipartUploadState {
    Ready,
    UploadInProgress(Pin<Box<dyn Future<Output = Result<()>>>>),
    CompleteInProgress(Pin<Box<dyn Future<Output = Result<()>>>>),
    Completed
}

Need to be able to hold onto the futures between the different poll calls. Hopefully, with an example for S3, the implementations for other object stores can just follow the same general pattern.