Closed lawrenceadams closed 1 month ago
Thanks @lawrenceadams for this investigation! Very interesting. I think that having type checking would be beneficial, and that we should pursue this via data_type
and contracts. Regarding the BIGINT pickle, I think we could potentially just specify the datatypes as bigint in the schema.yml where relevant. Many institutions use bigint for ID columns in the OMOP CDM (mine included)
Is there any downside to building the project when building docs? I'd guess maybe runtime (which for the seed DB is quite fast, so prob not a big concern), and/or size limits for what we can store in GH Actions (but I just checked and it seems for public repos there's no storage limit for Actions artifacts - https://github.com/orgs/community/discussions/26438#discussioncomment-3251931). Anything else?
Agree @katy-sadowski!
Not really - it runs within about 30 seconds so there's little added cost. Potentially if the OHDSI github org has burnt through all their free minutes (but having a quick look it doesnt look like many repos use them at length, so I think we should be fine...)
Potentially if the OHDSI github org has burnt through all their free minutes (but having a quick look it doesnt look like many repos use them at length, so I think we should be fine...)
Our 30 sec run is a drop in the bucket compared to many repos which run extensive tests against multiple cloud DBs. So we should be good here.
Sweet! Will make the change later
We have added schema and type descriptions (
yml
) of the final OMOP tables from #61. This works as desired and is complaint to the dbt schema. When building the docs locally (usually after the project has been run) there are no issues, and the docs show the column types - however at present the github workflow only runsdbt docs
without building the models - and so types are present in the documentation.data_type
attribute - however, this attribute is for enforcing type checking when enabling the dbt contract featurerow_number()
returns a BIGINTdbt docs
feature to add types to the documentation by default it checks the type returned by the database it has been run against. If the models have not yet been built, then it will fall back to thetype
attribute.This issue is to track discussion about what we wish to do regarding this, we can either:
type
instead ofdata_type
so we don't have to build the project in CI.I am a fan of 1, as it opens up the possibility of using contracts in the future - and using the
type
attribute is not documented anywhere (and violates the official yml definitions).See this thread for origin of issue