MyDigiTwinNL / CDF2Medmij-Mapping-tool

Tool for transforming Cohort-study Data (CDF) into FHIR/MedMij compliant resource bundles
Apache License 2.0
1 stars 0 forks source link

TobaccoUse - occasional and daily smoker mapping #6

Closed hcadavid closed 1 year ago

hcadavid commented 1 year ago

@baukearends @squareb In lifelines, there are variables for a one-to-one mapping of the non-smokers and ex-smokers codes of SNOMED. However, the variable 'current-smoker' could be mapped to two SNOMED codes: daily and occasional. In Lifelines there is a report of 'cigarettes' per day for all participants (so I assume it is an average) and hence, as far I can tell, there is no way to distinguish between daily and occasional smokers. If so, which of these two codes should be used?

https://github.com/MyDigiTwinNL/Lifelines2Medmij-Mapping-tool/blob/0f232b02b5dd631ac2b347965d62537bdcc29d39/src/lifelines/TobaccoUse.ts#L115-L146

baukearends commented 1 year ago

Clinically, the most important aspects are whether the subject actively smokes, as well as the cumulative tobacco exposure (pack years). Whether the subject smokes 2 sigarettes daily or 4 sigarettes every other day (for example), is less important.

Maybe the best SNOMED code is 65568007. This does not include a daily or occasional term.

squareb commented 1 year ago

Where possible you should consider using the variables developed by researchers from the department of epidemiology from the UMCG. They have created derivative scores for smoking with additional quality checks, you can find these variables here: http://wiki.lifelines.nl/doku.php?id=smoking_derivatives_v2 (I noticed you're already using eversmoker_v2)

hcadavid commented 1 year ago

That's right @squareb, I'm indeed using the derivatives when possible. In the specification above just found an error: the code name should be ever_smoker_adu_c_2 (which corresponds to eversmoker_v2). The codes (the names with the c_2 suffix) would be the actual derivatives column names in the data files, right?

hcadavid commented 1 year ago

Clinically, the most important aspects are whether the subject actively smokes, as well as the cumulative tobacco exposure (pack years). Whether the subject smokes 2 sigarettes daily or 4 sigarettes every other day (for example), is less important.

These aspects are indeed covered by the derivative variables used in this version of the mapping.

Maybe the best SNOMED code is 65568007. This does not include a daily or occasional term.

@baukearends 65568007 does not belong to the 'valuesets' indicated by the TobbacoUse ZIBs https://zibs.nl/wiki/TobaccoUse-v3.1(2017EN), so using it would go against the implementation guidelines (I haven't checked if the FHIR validator would report this as an error, though). All the codes in this valueset are in the same branch (Tobacco exposure/Smoking Behavior/Smoking Consumption-finding), so I'm not sure these can be mixed with 65568007 as they are in a different branch/hierarchy (Tobacco exposure//Smoking Behavior/Smoker).

What about 'Tobacco smoking consumption unknown'? (it is in the same hierarchy)

https://bioportal.bioontology.org/ontologies/SNOMEDCT/?p=classes&conceptid=http%3A%2F%2Fpurl.bioontology.org%2Fontology%2FSNOMEDCT%2F266927001

baukearends commented 1 year ago

Your suggestion is better than, as long as this gets validated by the FHIR validator. Otherwise, we might just have to use daily and report this as a data limitation.

hcadavid commented 1 year ago

@baukearends @squareb I set it as 'daily', as 'Tobacco smoking consumption unknown' can be interpreted as uncertainty not only about the frequency, but on whether or not the participant has ever smoked. I created a LIMITATIONS.md file and add this as its first entry. Here I will also add the limitations related to data imputation.

https://github.com/MyDigiTwinNL/Lifelines2Medmij-Mapping-tool/blob/6e10f31367ff42f2eb925edb35ca057864c071ea/LIMITATIONS.md?plain=1#L6