elementary-data / elementary

The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
https://www.elementary-data.com/
Apache License 2.0
1.94k stars 165 forks source link

[ELE-36] Athena integration #77

Closed oravi closed 1 year ago

oravi commented 2 years ago

This is a new type of integration that was requested in the Slack community.

From a quick look it seems like dbt already supports Athena and it seems like most of the features are supported. The monitoring is implemented as dbt tests and therefore we will need to run the package and its tests on an env with Athena to see if the tests are working as expected on this platform

From SyncLinear.com | ELE-36

privatedumbo commented 2 years ago

Hey! I'm currently using dbt with Athena. I would like to be able to use elementary on top of it. When I tried to set it up, it failed; some elementary models could not be created because things like unique constraints which are not supported on Athena. Shall we work on it? I might be able to help depending on the complexity of the task :)

bruno-ribeirodasilva commented 1 year ago

anyone working on this?

oravi commented 1 year ago

Happy to provide guidance @bruno-ribeirodasilva if you are willing to contribute it!

nicor88 commented 1 year ago

Worth to mention: dbt athena community forked the original repo, and made a community-based adapter that is pretty aligned with latest dbt releases, and thanks to iceberg can offer full lakehouse capability (with more DML supported, like MERGE/UPDATE/DELETE). Would be indeed awesome to support Athena in elementary.

here you can see a convergence of usage from the old adapter to the new one.

As one of the main maintainers of the new community-based adapter I would like to help or support the feature implementation.

Maayan-s commented 1 year ago

Hi @nicor88, Great to hear that additional adapters are maturing like this! We would be happy to try and support an effort to make Elementary compatible with Athena. To be honest I'm not familiar with it so I don't know how hard it would be.

Generally speaking, we implemented every platform-specific functionality using the adapter.dispatch functionality, as dbt recommends. You can see an example in this macro. However, where there was a dbt_utils macro that we could use, we did. I do see utils in the dbt-athena adapter you shared, so it looks promising in that sense. Anyway, You can see here a workaround we did for a missing util in Spark.

I think you should approach it gradually - Step 1 - Add support for uploading dbt artifacts and run results (in the dbt package). Step 2 - Add support in the CLI for Slack alerts and UI generation. Step 3 - Add support for data anomaly detection test (the most complex and platform-specific part of the code right now).

You can see here a guide for testing: https://docs.elementary-data.com/general/contributions#contributing-to-the-dbt-package

artem-garmash commented 1 year ago

Hi all, I made a poc (limited to Athena 3/iceberg) to get familiar with elementary and dbt-athena. It's kind-a working, at least main features (e.g. updating dbt artifacts tables, generating reports, dbt tests, elementary anomaly detection tests and alerts) against a toy project. There are several issues/hacks and I'd like to have some input what's the best way to address them.

I'm really interested to move this forward and it would be great if someone checked those PRs and provided some guidance to prepare proper PRs for this integration. And in case someone is already trying the same, I'm happy to collaborate and help.

Maayan-s commented 1 year ago

Thank you so much for this @artem-garmash! 👏 Adding a task for the next sprint to review + support you with completing this 🤝

haritamar commented 1 year ago

Hi @artem-garmash ! Thank you so much for your contribution. Sorry it took us a bit of time to get to.

I have just started to review your PRs, but one thing that would be great is if you can also run the E2E tests of the dbt package - as it's one of the main ways to ensure (most of) the functionality is working properly. They are located under the integration_tests folder, and can be run by invoking this script:

./run_e2e_tests.py -t athena

I'd be happy to help with this, we can also go over it in a call if you'd like.

nicor88 commented 1 year ago

@artem-garmash any plan to continue on this great work? - @haritamar integration tests are run manually as it is now? - I'm wondering if there is a smooth way to run them in the CI process at some point - that will require some extra AWS resources.

artem-garmash commented 1 year ago

@nicor88 , thanks for checking out the PRs. I was just thinking of having another look at them to have something usable in a few weeks. Need to rebase and re-test with all latest and greatest including dbt-athena and have integration tests as @haritamar suggested. The only 2 big open issue for me:

  1. elementary should have some macro for timestmap literals added. They are always explicit in athena but handled as strings in elementary and other adapters. Plus in some cases (reading json?) timestamp literals in the iso8601 format and should be converted via from_iso8601_timestamp().
  2. how to have adapter specific table properties. E.g. table_type="iceberg". Either such properties should be provided from the adapter somehow to tables config. how? Or maybe dbt-athena could have default table type configurable in the dbt profile, e.g. to use iceberg tables by default.
nicor88 commented 1 year ago

regarding 2 I think that considering that elementary share the same setup of dbt - it's possible to leverage maybe dbt_project.yml to specify globally table_type="iceberg" - could you try if that work? - I won't make iceberg default table type in dbt-athena (at least not yet) - and I believe that it's something that must be configurable by the package that we use in combination with dbt.

haritamar commented 1 year ago

Hi @artem-garmash @nicor88 ! Sorry for disappearing here, thanks a lot @nicor88 for reviewing the PRs.

I wanted to note that we have changed our tests infrastructure to one that I think is much easier to run. More details on running it can be found here: https://github.com/elementary-data/dbt-data-reliability/blob/master/CONTRIBUTING.md#running-integration-tests

nicor88 commented 1 year ago

@artem-garmash thanks 💯

@artem-garmash are you planning to continue the great work and make the integration working with athena? I know that there is some interest from some folks using dbt-athena-community.

mrshu commented 1 year ago

I can confirm that the lack of Athena support is the only reason that's currently preventing us from using Elementary at Slido.

artem-garmash commented 1 year ago

Hi all, I've submitted PRs based on the early POC work, addressing comments from @nicor88 and @haritamar, and after running it in production for a month for an athena dbt project and being happy with the result: https://github.com/elementary-data/dbt-data-reliability/pull/597 https://github.com/elementary-data/elementary/pull/1251