airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
15.28k stars 3.95k forks source link

Destination GCS: partition buckets based on different query result from Clickhouse #15403

Open gopiather opened 2 years ago

gopiather commented 2 years ago

Hi,

Source - Clickhouse Destination - GCS

I have a table in Clickhouse where I want to store the subsets of the table's data to different path of a bucket in GCS, Now I can see that all the data in that table is passing to single bucket,

But Is this possible for GCS destination?

marcosmarxm commented 2 years ago

Hello @gopiather general questions about Airbyte is better addressed in Airbyte forums: https://discuss.airbyte.io/

I don't quite understand what are you trying to achieve? Send information from your Destination (path of GCS bucket) to your Source?

gopiather commented 2 years ago

@marcosmarxm Thanks for your response,

So the usecase is, my destination is GCS and source is Clickhouse.

I have a table called user_details (it has timestamp,user_id,user_transaction_data) in clickhouse, I setup the GCS as destination and it is able to push to a bucket, But I want to store the data in GCS as ///user_id_1 ///user_id_2 ///user_id_3 ///user_id_1 ///user_id_2 ///user_id_3 and so on....

This is the usecase which I want to achieve, table is same but data in the table has to be pushed to different buckets based on user_id.

I hope you understand the usecase now, if you want more clarification, let me know I am happy to explain more.

marcosmarxm commented 2 years ago

You want to partition based on a column field in Clickhouse. This is not possible today.

marcosmarxm commented 2 years ago

I added the feature request to team backlog.

misteryeo commented 2 years ago

Issue was linked to Harvestr Discovery: New Source : Clickhouse