MeltanoLabs / Singer-Most-Wanted

4 stars 0 forks source link

target-s3 (all) #37

Open pnadolny13 opened 2 years ago

pnadolny13 commented 2 years ago

I commonly hear people saying they want to write to S3 but as parquet or avro which doesnt have good support. Having a target thats file format specific means changing between file formats is a brand new install of a new target.

Create a target S3 that can abstract the file format:

pnadolny13 commented 2 years ago

I'm going to start working on this one during Love-tap-fest!

aaronsteers commented 2 years ago

@pnadolny13 - For what it's worth, I do think at least CSV, JSON, and Parquet would all be good to have built-in serializers/deserializers in the SDK. So, you can write here or into the SDK directly according to what makes the most sense to you in this iteration. As you are building though, just wanted to call out that a long-term plan to put upstream would be inline with the SDK direction already.

Would be a step in the direction of: https://gitlab.com/meltano/sdk/-/issues/9

pnadolny13 commented 2 years ago

@aaronsteers nice, thanks for sharing that context! I think I'll probably see how far I get with it here and if the code ends up making sense in the SDK I'll migrate it later.

aaronsteers commented 2 years ago

Returning after some time to think on this... I think there's a good chance we can do this in a robust and generic manner, targeting any cloud that supports smart_open and/or pyfilesystem: