OHDSI / dbt-synthea

[Under development] A dbt ETL project to convert a Synthea synthetic data set into the OMOP CDM
https://ohdsi.github.io/dbt-synthea/
Apache License 2.0
16 stars 6 forks source link

feat: Add workflow to generate DBT documentation and deploy to GitHub Pages #73

Closed lawrenceadams closed 1 month ago

lawrenceadams commented 1 month ago

This PR adds automatic dbt docs generation for the dbt-synthea project.

Why bother?

The docs are a nice place to start for those not used to either dbt / OMOP to explore the models/DAG/etc (IMHO - feel free to disagree!); is easier for us to point people to our project!

This approach requires little/no effort of maintainers once setup.

How does it work

Next steps

katy-sadowski commented 1 month ago

I do agree with adding this! Thanks for the initiative! I think I enabled Pages:

Screenshot 2024-10-04 at 8 32 33 PM

If this looks right, go ahead and merge to test it out!

katy-sadowski commented 1 month ago

It works!!! https://ohdsi.github.io/dbt-synthea

For some reason, the data types are not showing up in the GH Pages site, though they do show up when I build it locally. Quick Google says this might be a fix? https://github.com/dbt-labs/dbt-spark/issues/988

lawrenceadams commented 1 month ago

Excellent!! 🥳

For some reason, the data types are not showing up in the GH Pages site, though they do show up when I build it locally. Quick Google says this might be a fix? dbt-labs/dbt-spark#988

Interesting... I have can reproduce the same thing. The documentation on the docs is ironically thin on how you should do this.

I can confirm that if you use type then the docs works. Buttt - the reason we don't have this issue locally is that if you have dbt run / dbt build your project and have your warehouse available then dbt will grab the data type metadata from the database. I.e. if we update the workflow to do dbt build then it will work fine.


Going down the rabbit hole, I am torn as to what to do:

Interestingly, this should be handled by the adapter (unless this is for something else...):

https://github.com/duckdb/dbt-duckdb/blob/41794e3edc2dfb4237bee0167e4df07f581e5ba6/dbt/adapters/duckdb/column.py#L18-L24

But does not seem to work!

What do you think @katy-sadowski ? I think doing the dbt run step before building the docs would be a way around this, and allow us to enforce schema checking in the future - but just using type is another valid solution

lawrenceadams commented 1 month ago

I've gone too deep into the weeds, but essentially if we want contract enforcement to work then the dbt-duckdb adapter needs to be updated to overload the DuckDBColumn class here with an updated TYPE_LABELS dict which would include mappings for 'BIGINT': 'INTEGER'.

I have tested it as the below and confirm this builds and the contract enforcer code is happy when this is present:

image

I do think it would be cool to have contractually bound type checking using the CDM docs as the source but this would require upstream dbt-duckdb changes, and I think BIGINT is thought to be a distinctly different type to INT (although I am not sure why, the contract feature allows type aliasing to be enabled or disabled at will.)


@katy-sadowski I will pull this out into a new issue and we can discuss there! 🧠