Closed daplaci closed 3 months ago
@epiben This PR is ready for review. I have included in the concept lookup most of the concepts common across two hospitals, Rigs and Odense, assuming this can give us a good estimate from the dqd.
Main points:
Now that I'm back, I will look at this. I'll address the main points in separate comments so each thread has one theme.
unit_source_concept_id
and unit_source_value
Columns with the _source_concept_id
usually mean that the source data uses a vocab that exists in OMOP. ICD-10 and ATC are examples. E.g. condition_source_concept_id
: "If the CONDITION_SOURCE_VALUE is coded in the source data using an OMOP supported vocabulary put the concept id representing the source value here." (see this)
Therefore, I'm not sure unit_source_concept_id
is the best term, and I'm not really convinced we need it. unit_source_value
makes sense indeed.
This makes perfect sense, and I agree with the implementation except maybe the term "text". I think it might be better to use something like "freetext" to set it apart from the text data type.
I might even think that the way this new text type behaves is actually how the categorical one should behave, and the current categorical should be binary or boolean instead. Are we willing to go down that path?
agree I just thought it would be a bit easier to query all the source having the same unit concept id in the source data, without accounting for kPa or kpa or these kind of differences. we can easily leave it out and stop filling the column if we think it does not make sense
Adding "text" as concept_lookup_stem
This makes perfect sense, and I agree with the implementation except maybe the term "text". I think it might be better to use something like "freetext" to set it apart from the text data type.
I might even think that the way this new text type behaves is actually how the categorical one should behave, and the current categorical should be binary or boolean instead. Are we willing to go down that path?
I think you are right but I would include it as a future enanchment and pospone the implementation of the bool later on
agree I just thought it would be a bit easier to query all the source having the same unit concept id in the source data, without accounting for kPa or kpa or these kind of differences. we can easily leave it out and stop filling the column if we think it does not make sense
I get your point, but the current implementation actually violates the CDM documentation: "If the UNIT_SOURCE_VALUE is coded in the source data using an OMOP supported vocabulary put the concept id representing the source value here [UNIT_SOURCE_CONCEPT_ID].". So, we probably shouldn't do it.
The rest looks good to me. We need validation of the mappings, but except for the ones I've "tagged" with comments, I didn't find anything suspicious.
agree I just thought it would be a bit easier to query all the source having the same unit concept id in the source data, without accounting for kPa or kpa or these kind of differences. we can easily leave it out and stop filling the column if we think it does not make sense
I get your point, but the current implementation actually violates the CDM documentation: "If the UNIT_SOURCE_VALUE is coded in the source data using an OMOP supported vocabulary put the concept id representing the source value here [UNIT_SOURCE_CONCEPT_ID].". So, we probably shouldn't do it.
Fixed in 57d3efe
In this I am adding the value source value and unit_source_concept_id as I think they may help understanding the difference betweeem the source and the cdm.
ALso I am drafting the ABG, but I am having quite some problem with the source data. WIll open specific issues on the way