HDRUK / CaRROT-CDM

MIT License
6 stars 1 forks source link

Preserve unmapped values #163

Open ALightNHS opened 1 year ago

ALightNHS commented 1 year ago

Is it possible to preserve the unmapped and missing/invalid source values when converting from source tables to the CDM tables?

PhilAppleby commented 1 year ago

I would need to know more about what you mean by "preserve".

Rejected data does not generate CDM output as that would be meaningless

ALightNHS commented 1 year ago

Thank you for your response. I would like to see unmapped (potentially error-prone) source values in the CDM output as this would help to identify data quality issues/ inconsistencies where "similar" fields are captured in different systems.

For example, if patient height is stored in two different datasets, it would be useful to map these data sources, and then identify inconsistent source values per patient in the CDM.

Another side to this question is: how would we map source fields containing unstructured text data to the CDM? It wouldn't be feasible to apply the same mapping logic to a json config for clinical notes.

PhilAppleby commented 1 year ago

Hello again, could you let me have more information on the particular use-case you have in mind?

The software was designed, in collaboration with data partners, to map from input values to output OMOP concepts. Placing an input value in the OMOP output, unless explicitly mapped as a "source_value", would be a violation of this principle.

Also, with reference to your height example, if a person's information is captured as part of two different datasets, this tool will not be aware of this as it works in isolation on each data set individually. We could not use information from one dataset for the other unless we had explicit approvals to do so and therefore this tool has been designed to work on each data set in complete isolation.

Additionally, a file "summary.tsv" is produced which contains no detailed data but gives an indication of rejected input numbers as percentages.

Finally we do have manual methods for mapping clinical notes to OMOP concepts you would need to contact our data team for guidance on that.

ALightNHS commented 1 year ago

Hi Phil, Firstly, I would like to say thank you for your responses and patience. I am very excited about CaRROT and believe that it will have a significant impact.

With your permission, I would appreciate the opportunity to exchange emails to discuss this further?

Otherwise, I will try to explain further. I realise that my particular use-case for CaRROT (and the CDM in general) goes against their intended designs. I am trying to take advantage of the CDM's relational schema to integrate multiple data sources, identify inconsistencies, and then to diagnose and resolve these at source.

PhilAppleby commented 1 year ago

Hi there, as the development of CaRROT-CDM is funded by health data research projects we can discuss further if you contact me using my University of Dundee email account - p.d.appleby@dundee.ac.uk. Could you also identify yourself so I know with whom I'm talking?

ALightNHS commented 1 year ago

Hi Phil, my name is Anthony Lighterness - I'm a data scientist at The Christie NHS FT. I'll send you an email, thank you for that!