Closed hcadavid closed 2 months ago
@KasiaSmietanka In another comment I mentioned that I'm re-running the pipeline with updated rules that would lead to fewer skipped participants. However, what is mentioned here is the cause of most skipped ones.
Hi Hector,
Since we cannot be sure about the type of diabetes for these subjects, could we assign the parent code for Diabetes Mellitus? Maybe we can use it as a sort of Diabetes, unspecified.
https://bioportal.bioontology.org/ontologies/SNOMEDCT?p=classes&conceptid=73211009
Thanks @baukearends I already made the change and will run the pipeline again.
The harmonization process was designed in a way that avoids including FHIR resources with inconsistencies. This means that if the input of the pairing rules cannot be interpreted consistently, the whole participant is excluded. This is made through 'assertions', included across all the modules with pairing rules, that evaluate some conditions on the input data that would make it impossible to interpret it for generating the target FHIR resource.
Around only ~0.9% of the participants were excluded from the process due to this mechanism. However, I recently found that in most of these cases, this was caused by the same reason: the diabetes type couldn't be determined (see the preconditions):
https://github.com/MyDigiTwinNL/CDF2Medmij-Mapping-tool/blob/071cb66c2fa3a625d813e082556860a74b6b63f4/src/lifelines/Diabetes.ts#L210-L231
That is to say, in these cases, diabetes presence/follow-up was reported, but the type field (t1d_followup_adu_q_1 / t2d_followup_adu_q_1) was left empty.
@baukearends, @squareb do you see a way not to miss these data points?