Open anglinb opened 2 years ago
Yeah! Support for https://clickhouse.com/ would be a great feature.
Thanks, @Slach and @anglinb! Today we leverage dbt to connect to the data warehouse, and a dbt-package we developed for the implementation of the data monitorng. As far as I know, dbt don't have an official adapter for Clickhouse, but there is a community supported one.
Are you dbt users? Do you know if the adapter supports all the features, and if not, which are missing?
@Maayan-s now https://github.com/ClickHouse/dbt-clickhouse officially support by ClickHouse Inc.
Hi @Maayan-s is there a guide for integrating Elementary with community dbt adapters such as Clickhouse's? I'd be happy to try to contribute if so.
Hi @slygent , First of all, thank you for wanting to contribute to the project! We don't have a guide for it but I would be happy to provide guidance.
Generally speaking, we implemented every platform-specific functionality using the adapter.dispatch
functionality, as dbt recommends. You can see an example in this macro.
However, where there was a dbt_utils macro that we could use, we did. This might cause a problem with Clickhouse as dbt_utils does not support it. Since dbt 1.2.0 many of the macros were migrated from dbtutils to the adapter code, but I'm not sure if these are all implemented in Clickhouse, although the release notes say explicitly: "Support the cross database dbt-core macros (migrated from dbt-utils)"_. Anyway, You can see here a workaround we did for such a gap wirth Databricks.
I'm not familiar with the Clickhouse adapter, so it's hard to assess how many changes such integration will require. We recently decided (due to demand from the community) to add a Databricks integration, and approached it gradually -
Step 1 - Add support for uploading dbt artifacts and run results (in the dbt package). Step 2 - Add support in the CLI for Slack alerts and UI generation. Step 3 - Add support for data anomaly detection test (the most complex and platform-specific part of the code right now).
You can check out this PR implementing step 1 for Databricks. As you can see it actually required pretty minor changes. If you want to give a shot with Clickhouse, I would be happy to support you!
Hi @Maayan-s and @elongl
I hope you don't mind that I picked this up and started working on it.
Current status of clickhouse - elementary integration
Step 1 - Add support for uploading dbt artifacts and run results (in the dbt package). -- COMPLETED (able to create dbt artifacts) Step 2 - Add support in the CLI for Slack alerts and UI generation. -- In Progress Step 3 - Add support for data anomaly detection test (the most complex and platform-specific part of the code right now) -- Pending
I would love to hear about your thoughts and suggestions. I could not test my updates as mentioned in the documentation. It seems the integration_tests
are deprecated as of now.
I would appreciate any guidance or help on the testing part, also how do you suggest we proceed from here?
Thanks @Arun-kc, really cool that you started working on this! @haritamar is working on updating the contribution guide with the new integration tests.
Even basic support for clickhouse will go a long way. :)
(Feel free to close if this isn't helpful :) )
We (Superwall S21) have been looking for a tool like this to help us monitor our data pipelines. We help customers understand the performance of changes to monetization campaigns in apps so it is super important we know when something is broken. Right now we have dashboards in Grafana that help us see overall counts but have literally been caught by one of the examples you called out in our docs, an increased null rate. This would have saved up soooo much time.
Our stack looks like SDK -> NodeJS API -> Kafka -> Clickhouse right now and we're looking for better monitoring tooling to let us know when something is broken.
ELE-33