Open bhaskar-pv opened 6 months ago
Same.
This is happening to us as well the data is not being copied to main database.
I have faced the same issue. Airbyte isn't even showing sync failed. Is it connector bug?
I believe this is due to their rollout of Destinations V2. They seem to be pushing people to external orchestration systems. So I don't think this is a bug.
Here are some discussions I dug up that seem relevant.
https://github.com/airbytehq/airbyte/discussions/35339 https://github.com/airbytehq/airbyte/discussions/34860
From what I can see they seem to be focusing on E and L and pushing people to other platforms for T.
Maybe @jbfbell, @rileybrook or @cgardens could shed some light on this?
@anthonator however i tested with postgres destination the tables were created correctly in airbyte_internal database and main database where sync was suppose to happen but in case of clickhouse only airbyte_internal database tables were filled with data. no tables or data was present in main db specified in clickhouse destination
@abhishekgahlot2 from my understanding each destination needs to implement normalization and the ClickHouse destination currently does not.
@anthonator sorry for the delayed reply here but yes as of 1.0.0 we removed what we referred to as "normalization" or the creation of typed tables from Clickhouse. As You pointed out this was a result of the dv2 work. Normalization in its previous state was unmaintainable for us as a team and we are removing that previous implementation from the platform completely. While rolling out Dv2 to various destinations, this proved to be a time consuming process and we made the decision to pivot towards improving the underlying shared libraries. To put it another way, we would love to enable ourselves or the community to easily add a new v2 destination, but we are not there yet. However, we are actively working on getting there. Unfortunately Clickhouse fell on the other side of the cut line here.
Our hope was that by still moving the raw data rather than removing the Clickhouse connector completely, you could still build dbt models or other solutions on top of these tables.
While I understand this is likely not the response you're hoping for, thank you for bringing this up and contributing to that linked github discussion. It definitely helps with the prioritzation of this work.
Are there any tools that i can use to convert the raw data to final tables meanwhile the support is coming for clickhouse in future.
Probably way to use the models generated by clickhouse and transform to final data.
@abhishekgahlot2 they mention Airflow, Prefect and Dagster in https://github.com/airbytehq/airbyte/discussions/34860.
Also see https://airbyte.com/blog/integrating-airbyte-with-data-orchestrators-airflow-dagster-and-prefect
Thank @anthonator gonna give it a try.
Are there any tools that i can use to convert the raw data to final tables meanwhile the support is coming for clickhouse in future.
@abhishekgahlot2 ClickHouse comes with excellent JSONExtract-functions to parse the data from the column _airbyte_data
. You can use these function when you query the data or use them in dbt tranformations.
@jbfbell Is there some kind of timeline when we can expect the ClickHouse connector to work as expected again?
@jesperbagge jsonextract sounds like a good idea though i believe it will requires copying the whole data again because it won't support incremental append i believe or deduplication.
@jbfbell Considering ClickHouse is virtually your only supported modern on-prem DB, I am surprised to see this connector isnt getting more attention. ClickHouse has seen broad adoption in the last couple of months everywhere we look.
Oh same issue, ClickHouse is such a beast—it's disappointing to know normalization is not possible.
Our normalisation was straight forward and works with other DBs so seamlessly. JSONExtract as mentioned beats the purpose as we have quite a lot of tables and various sources too.
cc : Airbyte team @jbfbell
JSONExtract as mentioned beats the purpose as we have quite a lot of tables and various sources too.
@o1lab Yeah, I came to that conclusion myself in the end for the same reasons. I downgraded to version 0.2.5
to at least have structured data.
Also, I'm a big fan of NocoDB!
They should have atleast update the AB Cloud documentation for ClickHouse, at minimum. It is in a broken state as is
I am in no position to contribute currently, but I'll share another insight here for when clickhouse gets some attention- currently even the internal tables do not append properly. I have had a test connection between stripe and clickhouse set up for several days now as well as the same connection with same schema set up between stripe and redshift. It seems that after sync'ing every 5 minutes for 5 days, the clickhouse internal raw tables plainly are missing some updates where the redshift matches stripe dashboard records perfectly. So just using JSONExtract functions on the clickhouse internal tables airbyte generates, is not going to be accurate.
Connector Name
destination-clickhouse
Connector Version
v1.0.0
What step the error happened?
During the sync
Relevant information
I am trying to fetch data from Jira to Clickhouse. Clickhouse created a database
airbyte_internal
but after that it didnt create tables into the database which i provided into the configuration. Also there is no error in the logsRelevant log output
Contribute