Closed lawrenceadams closed 1 month ago
Failing tests
Completed with 5 errors and 0 warnings:
Failure in test relationships_cdm_source_cdm_version_concept_id__concept_id__ref_concept_ (models/omop/_models/cdm_source.yml)
Got 1 result, configured to fail if != 0
compiled code at target/compiled/synthea_omop_etl/models/omop/_models/cdm_source.yml/relationships_cdm_source_df5013262b164e5d6a81286c64573d16.sql
Failure in test relationships_concept_class_concept_class_concept_id__concept_id__ref_concept_ (models/omop/_models/concept_class.yml)
Got 423 results, configured to fail if != 0
compiled code at target/compiled/synthea_omop_etl/models/omop/_models/concept_class.yml/relationships_concept_class_8721149b79fd66d2ede1bec89de7b4d1.sql
Failure in test relationships_domain_domain_concept_id__concept_id__ref_concept_ (models/omop/_models/domain.yml)
Got 50 results, configured to fail if != 0
compiled code at target/compiled/synthea_omop_etl/models/omop/_models/domain.yml/relationships_domain_62d8024ca3201ee04cfb36bc99795f33.sql
Failure in test relationships_relationship_relationship_concept_id__concept_id__ref_concept_ (models/omop/_models/relationship.yml)
Got 696 results, configured to fail if != 0
compiled code at target/compiled/synthea_omop_etl/models/omop/_models/relationship.yml/relationships_relationship_40aacd30ee50ea8e7e831ae83e697549.sql
Failure in test relationships_vocabulary_vocabulary_concept_id__concept_id__ref_concept_ (models/omop/_models/vocabulary.yml)
Got 59 results, configured to fail if != 0
compiled code at target/compiled/synthea_omop_etl/models/omop/_models/vocabulary.yml/relationships_vocabulary_46c2fc3cd57281d5dda05ddb6e718920.sql
Looking at these - they're likely as a result of incomplete vocabulary
Additionally, I am having trouble getting some of the docs to show up in the
dbt docs
site. I think it's because of the hacks I did to (1) allow users to either seed the vocab OR bring their own, and (2) to create empty tables for the OMOP tables not populated by the ETL. (The impacted tables all seem to fall into one of these 2 categories.)
I had the same issue, seems to not bother generating the docs if the model doesn't generate any rows - which surprised me.
I need to spend a bit more time pondering these. I don't think they should block this PR because there are actually some fundamental underlying questions at play here for the vocab tables. (i.e., Should they even be "models"? Should the ETL touch these tables or do we make a source to concept mapper table? etc.). But there might be a simple short-term cleanup I can do just to get the docs site fully functional; I'll let you know what I come up with :)
I agree, and I have had the same problem in the past when handling the vocab tables. I ended up making the vocab a model as when you scale up if and are doing lots of joins to map you need indexes or it'll take an age... (with postgres and friends anyway)
I'll crack on with the other issues as I think it's good to get this right first time! If we start using dbt
tests for the DQD tests then we don't want to be regenerating and nuking the git history (not that it matters)
@katy-sadowski those were both easy fixes it turns out! I have refreshed with new output.
@lawrenceadams I opened a PR against your branch here, which fixes the dbt docs issues: https://github.com/OHDSI/dbt-synthea/pull/64
One more question on the vocab model thing - do you create a model for the vocab tables, even though that means duplicating the vocab data tables? Right now, I have this line on the vocab models, which only builds them if the vocab seeds (which are small) are being used: https://github.com/OHDSI/dbt-synthea/blob/5561586f6d78f71ac009f8969df0c56c5ccb45e9/models/omop/seed_vocab/concept.sql#L1. However, I also don't love not having these as proper models, just from a consistency perspective.
Excellent catch! I didn't think of the colliding names being an issue - dbt does annoy me the way it'll silently fail / do these things... but a nice fix
One more question on the vocab model thing - do you create a model for the vocab tables, even though that means duplicating the vocab data tables? Right now, I have this line on the vocab models, which only builds them if the vocab seeds (which are small) are being used:
Good question, and I am unsure to be honest... I have created models for them (usually just concept
) in the past myself in other projects - I'd agree it doesn't make a huge amount of sense, although allows you to use them as references neatly.
Having thought about it - I think we will always need the vocab to be expressed as these models - or the tests this PR adds won't work when someone uses their own data to upload. e.g.:
will not work if in 'BYOD' mode, as otherwise ref('concept')
won't exist
I think either way we should probably have a model that is always on for any model, but either:
Have I gone too far off on a tangent? Hope my morning brain makes sense 😆
I think either way we should probably have a model that is always on for any model, but either: accept it will be physically materialised as a table (storage is cheap etc; I suppose doesn't scale well with larger vocabs) have these vocab models as views and then space isn't an issue (if it ever was) Have I gone too far off on a tangent? Hope my morning brain makes sense 😆
This makes total sense. I say, let's make them materialized table models for now (for both seed and BYOD). It's tidier, and performance/storage/etc are a whole other thing to tackle. Do you agree?
Yep - 100%
Sounds fab! I'm happy to refactor those tables if you've not started!
I had it started locally, here's the PR! Feel free to merge it right in if you approve :)
Ahhh yes - this annoying GitHub state - don't worry - please merge away!
Uses tests generated from #59
This PR brings in all tests outlined by the OMOP documentation in a repeatable manner! We are now at 425 tests 🥳 🍰 (previously 51... do we have too many now? 😆 )
Notes
episode
,dose_era
,specimen
where no rows are returned from the models - the docs show up as empty (and show no tests by extension)~ now fixed by #63