elementary-data / elementary

The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
https://www.elementary-data.com/
Apache License 2.0
1.87k stars 158 forks source link

[ELE-33] [Feature] Support Clickhouse #52

Open anglinb opened 2 years ago

anglinb commented 2 years ago

(Feel free to close if this isn't helpful :) )

We (Superwall S21) have been looking for a tool like this to help us monitor our data pipelines. We help customers understand the performance of changes to monetization campaigns in apps so it is super important we know when something is broken. Right now we have dashboards in Grafana that help us see overall counts but have literally been caught by one of the examples you called out in our docs, an increased null rate. This would have saved up soooo much time.

Our stack looks like SDK -> NodeJS API -> Kafka -> Clickhouse right now and we're looking for better monitoring tooling to let us know when something is broken.

ELE-33

Slach commented 2 years ago

Yeah! Support for https://clickhouse.com/ would be a great feature.

Maayan-s commented 2 years ago

Thanks, @Slach and @anglinb! Today we leverage dbt to connect to the data warehouse, and a dbt-package we developed for the implementation of the data monitorng. As far as I know, dbt don't have an official adapter for Clickhouse, but there is a community supported one.

Are you dbt users? Do you know if the adapter supports all the features, and if not, which are missing?

Slach commented 2 years ago

@Maayan-s now https://github.com/ClickHouse/dbt-clickhouse officially support by ClickHouse Inc.

gfrmin commented 1 year ago

Hi @Maayan-s is there a guide for integrating Elementary with community dbt adapters such as Clickhouse's? I'd be happy to try to contribute if so.

Maayan-s commented 1 year ago

Hi @slygent , First of all, thank you for wanting to contribute to the project! We don't have a guide for it but I would be happy to provide guidance.

Generally speaking, we implemented every platform-specific functionality using the adapter.dispatch functionality, as dbt recommends. You can see an example in this macro.

However, where there was a dbt_utils macro that we could use, we did. This might cause a problem with Clickhouse as dbt_utils does not support it. Since dbt 1.2.0 many of the macros were migrated from dbtutils to the adapter code, but I'm not sure if these are all implemented in Clickhouse, although the release notes say explicitly: "Support the cross database dbt-core macros (migrated from dbt-utils)"_. Anyway, You can see here a workaround we did for such a gap wirth Databricks.

I'm not familiar with the Clickhouse adapter, so it's hard to assess how many changes such integration will require. We recently decided (due to demand from the community) to add a Databricks integration, and approached it gradually -

Step 1 - Add support for uploading dbt artifacts and run results (in the dbt package). Step 2 - Add support in the CLI for Slack alerts and UI generation. Step 3 - Add support for data anomaly detection test (the most complex and platform-specific part of the code right now).

You can check out this PR implementing step 1 for Databricks. As you can see it actually required pretty minor changes. If you want to give a shot with Clickhouse, I would be happy to support you!

Arun-kc commented 1 year ago

Hi @Maayan-s and @elongl

I hope you don't mind that I picked this up and started working on it.

Current status of clickhouse - elementary integration

Step 1 - Add support for uploading dbt artifacts and run results (in the dbt package). -- COMPLETED (able to create dbt artifacts) Step 2 - Add support in the CLI for Slack alerts and UI generation. -- In Progress Step 3 - Add support for data anomaly detection test (the most complex and platform-specific part of the code right now) -- Pending

I would love to hear about your thoughts and suggestions. I could not test my updates as mentioned in the documentation. It seems the integration_tests are deprecated as of now.

I would appreciate any guidance or help on the testing part, also how do you suggest we proceed from here?

Maayan-s commented 1 year ago

Thanks @Arun-kc, really cool that you started working on this! @haritamar is working on updating the contribution guide with the new integration tests.

timeto-in commented 6 months ago

Even basic support for clickhouse will go a long way. :)