Open mustafa-rmd opened 2 years ago
Hi @mustafa-rmd , could you please edit your request to follow our feature request template? This will ensure all details are understood clearly. I've copied it below. Thank you!
What are you trying to do, and why is it hard? A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
A clear and concise description of what you want to see happen, or the change you would like to see
A clear and concise description of any alternative solutions or features you've considered or are using today.
Add any other context or screenshots about the feature request here.
Remove this with your answer :-)
Deltalake (Delta table) format is an essential format for many pipeline architecture epically for ones that uses apache Spark in their pipeline.
I would like Delta format to be added along with apache avro, Json, etc.
Not alternatives
When choosing a destination format I would like to see Delta format as one of the options
Yes
@dennyglee Noted in your discussion that you're adding this to your roadmap. Just wanted to confirm that you're planning to contribute here?
@misteryeo Yes, we are planning to contribute here - it may or may not be me personally, but feel free to ping me on this until we figure this out :)
Hey @dennyglee is there any update on that?
@dennyglee @mustafa-rmd Any updates on this by any chance?
Hi @dennyglee @mustafa-rmd Any updates on this feature request? I am using Airbyte & DeltaLake in production. So I would love to see this destination connector to be available as soon as possible. I'm willing to give you some hands if needed.
Just want to chime in that I'm also interested in this!
Edited to add that I'm interested in writing a delta table to S3. I'm not sure I'll end up making a PR for this, but for anyone else who wants the same thing it looks like a PR would have to be made here: https://github.com/airbytehq/airbyte/tree/0e9fdba1181b2d302b81a057f6fa16a198925eaa/airbyte-integrations/bases/base-java-s3/src/main/java/io/airbyte/integrations/destination/s3
You'd also have to make a PR here: https://github.com/airbytehq/airbyte/blob/0e9fdba1181b2d302b81a057f6fa16a198925eaa/airbyte-integrations/connectors/destination-s3/src/main/resources/spec.json
Do we have any update on this feature request ?
My current requirement is to have the following data pipeline: PostgreSQL (Source) Air byte Minio - S3 storage (Destination) Apache spark configure with (Minio and Delta lake formatting) since spark doesn’t support ACID transactions.
The goals to have air bye move data from PostgreSQL (Source) to Minio storage (Destination) saved in delta format. Spark then will come and read data from S3 expected to be with delta format.
My main issue with the output format for Air bye S3 connector. Currently is only supports 3 data types: CSV, Avro and JSON Lines (JSONL).
What is the recommend way to solve this problem? since I think, many companies are trying to build this data pipeline. Is there plan to have this feature released in upcoming releases? Should we implement this feature? If so, is there a good documentation of how to start about it? Or, is there another method of going about it?
Thanks,