INCEPTdk / omop_etl

3 stars 2 forks source link

Add missing value source value #115

Closed daplaci closed 3 months ago

daplaci commented 4 months ago

In this I am adding the value source value and unit_source_concept_id as I think they may help understanding the difference betweeem the source and the cdm.

ALso I am drafting the ABG, but I am having quite some problem with the source data. WIll open specific issues on the way

daplaci commented 3 months ago

@epiben This PR is ready for review. I have included in the concept lookup most of the concepts common across two hospitals, Rigs and Odense, assuming this can give us a good estimate from the dqd.

Main points:

epiben commented 3 months ago

Now that I'm back, I will look at this. I'll address the main points in separate comments so each thread has one theme.

epiben commented 3 months ago

Adding unit_source_concept_id and unit_source_value

Columns with the _source_concept_id usually mean that the source data uses a vocab that exists in OMOP. ICD-10 and ATC are examples. E.g. condition_source_concept_id: "If the CONDITION_SOURCE_VALUE is coded in the source data using an OMOP supported vocabulary put the concept id representing the source value here." (see this)

Therefore, I'm not sure unit_source_concept_id is the best term, and I'm not really convinced we need it. unit_source_value makes sense indeed.

epiben commented 3 months ago

Adding "text" as concept_lookup_stem

This makes perfect sense, and I agree with the implementation except maybe the term "text". I think it might be better to use something like "freetext" to set it apart from the text data type.

I might even think that the way this new text type behaves is actually how the categorical one should behave, and the current categorical should be binary or boolean instead. Are we willing to go down that path?

daplaci commented 3 months ago

agree I just thought it would be a bit easier to query all the source having the same unit concept id in the source data, without accounting for kPa or kpa or these kind of differences. we can easily leave it out and stop filling the column if we think it does not make sense

daplaci commented 3 months ago

Adding "text" as concept_lookup_stem

This makes perfect sense, and I agree with the implementation except maybe the term "text". I think it might be better to use something like "freetext" to set it apart from the text data type.

I might even think that the way this new text type behaves is actually how the categorical one should behave, and the current categorical should be binary or boolean instead. Are we willing to go down that path?

I think you are right but I would include it as a future enanchment and pospone the implementation of the bool later on

epiben commented 3 months ago

agree I just thought it would be a bit easier to query all the source having the same unit concept id in the source data, without accounting for kPa or kpa or these kind of differences. we can easily leave it out and stop filling the column if we think it does not make sense

I get your point, but the current implementation actually violates the CDM documentation: "If the UNIT_SOURCE_VALUE is coded in the source data using an OMOP supported vocabulary put the concept id representing the source value here [UNIT_SOURCE_CONCEPT_ID].". So, we probably shouldn't do it.

epiben commented 3 months ago

The rest looks good to me. We need validation of the mappings, but except for the ones I've "tagged" with comments, I didn't find anything suspicious.

daplaci commented 3 months ago

agree I just thought it would be a bit easier to query all the source having the same unit concept id in the source data, without accounting for kPa or kpa or these kind of differences. we can easily leave it out and stop filling the column if we think it does not make sense

I get your point, but the current implementation actually violates the CDM documentation: "If the UNIT_SOURCE_VALUE is coded in the source data using an OMOP supported vocabulary put the concept id representing the source value here [UNIT_SOURCE_CONCEPT_ID].". So, we probably shouldn't do it.

Fixed in 57d3efe