OHDSI / dbt-synthea

[Under development] A dbt ETL project to convert a Synthea synthetic data set into the OMOP CDM
https://ohdsi.github.io/dbt-synthea/
Apache License 2.0
16 stars 6 forks source link

Add stg model for Allergies table #28

Closed katy-sadowski closed 6 months ago

katy-sadowski commented 7 months ago

Addresses #10

katy-sadowski commented 7 months ago

Thanks for your comment @vvcb ! The explanation/motivation for many of the things you're asking about here is my decision to follow dbt's recommended staging model format, which is documented here: https://docs.getdbt.com/best-practices/how-we-structure/2-staging. I think we need to choose and stick with a convention and dbt's own conventions feel like the best place for us to start - they're well-documented, specific, and battle-tested (but also generalizable to a plethora of use cases).

Regarding the lowercasing - @adambouras found this macro and I think it's a neat way to address the fact that different database systems handle column case differently. Synthea outputs csvs with all-caps column names so this is something we'll need to be able to handle in this project. Open to suggestions for alternate approaches here if you can think of something simpler!

Regarding renaming - this is another best practice recommended by dbt, namely to rename ambiguous staging column names to be more meaningful (especially columns like "id" which might appear as foreign keys in other tables). I'll admit, I may have gone overboard here with adding allergy_ to everything and am willing to scale back on that. But having meaningful, consistent naming in our staging tables is important for the interpretability of the project as a whole.