davidgasquez / gitcoin-grants-data-portal

🌲 Open source, serverless, and local-first data hub for Gitcoin Grants data!
https://grantsdataportal.xyz/
MIT License
26 stars 3 forks source link

Create dbt tests to ensure data quality #20

Closed DistributedDoge closed 8 months ago

DistributedDoge commented 8 months ago

dbt allows for schema testing, where we can declare what we want to see, and run dbt test to see if reality conforms to expectations.

models:
  - name: orders
    columns:
      - name: order_id
        tests:
          - unique
          - not_null

I think it is worthwile to create those for some important models. Since we are already describing schema it would be nice to also:

Ultimately, we could have a dbt test added as CI step to make sure that runs with malformed or missing assets do not proceed to IPFS upload stage.

davidgasquez commented 8 months ago

Great idea! This project used to have dbt tests at the begguining but was hard to integrate with Dagster.

I'm waiting for https://github.com/dagster-io/dagster/discussions/16527 to learn more about ways to integrate both projects together.

That said, a simple dbt test won't hurt anyone!

DistributedDoge commented 8 months ago

image

Few config tweaks and interpreting dbt tests as dagster asset checks seems to be working as intended.

Can share recipie once I deal with #22 to place tests where they belong.

davidgasquez commented 8 months ago

Nice! I was digging up into this and also realized that Dagster is integrating with dbt in a different way these days. Lets ship the tests with the current approach and then see if migrating is worth it.