RADar-AZDelta / Rabbit-in-a-Blender

An ETL pipeline to transform your EMP data to OMOP.
https://radar-azdelta.github.io/Rabbit-in-a-Blender/
GNU General Public License v3.0
11 stars 3 forks source link

concept_id 0 instead of null #105

Closed lore-edencehealth closed 1 month ago

lore-edencehealth commented 1 month ago

riab seems to always swap concept_id columns to 0 if no concept_id is found even if the original value is null. For required columns like condition_concept_id this is logical (although having a failing pipeline because of non-nullable constraint seems logical as well) but for nullable columns like ..._source_concept_id, unit_concept_id, modifier_concept_id, value_as_concept_id, ... this is unexpected. 0 should only be used if there is information but no valid concept_id is found, otherwise I think it should stay null.

pjlammertyn commented 1 month ago

according to the documentation:

Each Standard CONCEPT_ID field has a set of allowable CONCEPT_ID values. The allowable values are defined by the domain of the concepts. For example, there is a domain concept of ‘Gender’, for which there are only two allowable standard concepts of practical use (8507 - ‘Male’, 8532- ‘Female’) and one allowable generic concept to represent a standard notion of ‘no information’ (concept_id = 0). This ‘no information’ concept should be used when there is no mapping to a standard concept available or if there is no information available for that field. The exceptions are MEASUREMENT.VALUE_AS_CONCEPT_ID, OBSERVATION.VALUE_AS_CONCEPT_ID, MEASUREMENT.UNIT_CONCEPT_ID, OBSERVATION.UNIT_CONCEPT_ID, MEASUREMENT.OPERATOR_CONCEPT_ID, and OBSERVATION.MODIFIER_CONCEPT_ID, which can be NULL if the data do not contain the information (THEMIS issue #11).

There is no constraint on allowed CONCEPT_IDs within the SOURCE_CONCEPT_ID fields.