EHDEN / ETL-UK-Biobank

ETL UK-Biobank
https://ehden.github.io/ETL-UK-Biobank/
12 stars 4 forks source link

Ethon reflection 1: Are we missing people with Deep Venous Thrombosis (DVT)? #315

Closed MaximMoinat closed 3 years ago

MaximMoinat commented 3 years ago

Observations upon exploring the CohortDiagnostics:

Notably there are no people with outpatient visits and READ codes of DVT included.

Hypothesis: DVT diagnoses from the gp_clinical are not mapped correctly. They might have ended up in the wrong domain.

Next step:

  1. Search source data for DVT occurrences in the GP data
  2. Look for DVT concepts (i.e. descendants of 4133004 | Deep venous thrombosis) in other domains, most likely in measurement table.
MaximMoinat commented 3 years ago

Related to #113

MaximMoinat commented 3 years ago

Another reason for the low count of DVT cohort, is the cohort definition. In UKB the code 77310 - Deep vein phlebitis and thrombophlebitis of the leg also occurs a lot (10100 times), but this is not included in the DVT cohort definition.

MaximMoinat commented 3 years ago

image <screenshot from EHDEN Portal, Database Dashboard, Concept Browser> Interestingly, many concept counts come as 'pairs'. Why are they not grouped together? Is that because they are from different tables?

vpapez commented 3 years ago

1) I was searching for the following Read codes G801.13 (DVT - Deep vein thrombosis), L413.11 (DVT - deep venous thrombosis, antenatal), L414.11 (DVT - deep venous thrombosis, postnatal) in gp_clinical.txt, covid19_emis_gp_clinical.txt, covid19_tpp_gp_clinical.txt and found 0 results for all of them. 2) I found two records of 4133004 concept in covid19 gp_emis in measurement table. I also found 185 descendants of 4133004 concept. From these, there is

spiros commented 3 years ago

Hey, it might be a good idea to use a standard algorithm to pick these cases

I am not sure what L413.11 etc is - they dont look like real ICD-10 codes.... also neonatal/postnatal means they are related to pregnancy outcomes (which I suspect is not what you are after)

Here's a reasonably complete phenotype - https://portal.caliberresearch.org/phenotypes/kuan-vte-ex-pe-jrttcdr2u5bo88gjap7hd6

just to say it's been developed in Read 2 so it's likely to miss some newer codes - probably safest to use a newer list if you want a high sensitivity https://www.opencodelists.org/codelist/opensafely/venous-thromboembolic-disease/2020-09-14/

S

On Tue, Jul 27, 2021 at 3:48 PM vpapez @.***> wrote:

  1. I was searching for the following Read codes G801.13 (DVT - Deep vein thrombosis), L413.11 (DVT - deep venous thrombosis, antenatal), L414.11 (DVT - deep venous thrombosis, postnatal) in gp_clinical.txt, covid19_emis_gp_clinical.txt, covid19_tpp_gp_clinical.txt and found 0 results for all of them.
  2. I found two records of 4133004 concept in covid19 gp_emis in measurement table. I also found 185 descendants of 4133004 concept. From these, there is

    -

    Condition Occurence table: concept 77310: ~10k records concept 435887: 20 records concept 438820: 7 records

    Measurement table: concept 77310: 468 records concept 435887: 22 records concept 438820: 1 records concept 4133004: 2 records

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/EHDEN/ETL-UK-Biobank/issues/315#issuecomment-887482736, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAFZZALEZ7TYYEQRP3HDY3TZ2TJLANCNFSM5AYHW2HA .

vpapez commented 3 years ago

Thanks! L413.11 is Read. Problem is, that we don't have DVT from primary care. Thank you for the link from caliber, will check that Read 2 codelist.

vpapez commented 3 years ago

From the Read 2 codelist I did not find any records From the CTV3 codelist here https://www.opencodelists.org/codelist/opensafely/venous-thromboembolic-disease/2020-09-14/#full-list I found 12889 records in GP clinical and 12154 in Covid19 TPP GP clinical. More specifically, please find the attached files for code counts breakdown. The most common CTV3 code is Xa9Bs, XE0Um and G801. . covid19_tpp_gp_clinical.txt gp_clinical.txt

MaximMoinat commented 3 years ago
select count(*), string_agg(measurement_concept_id, '|')
from measurement
where measurement_source_value = 'G801.'

Gave count of 423, all mapped to concept 77310.

No occurrences of 'G801.' in condition occurrence table.

MaximMoinat commented 3 years ago

Further investigation revealed that all the 'G801.' measurement records had a value_as_number of 0. It is likely that in the source data, if value is missing/irrelevant, a value of 0 is given. This is also the highest frequency value in tpp_gp_clinical.

MaximMoinat commented 3 years ago

To close this issue: the answer is, yes, we are missing people with DVT. About 500 DVT source records are wrongly mapped to the measurement table, where they should have been mapped to the condition table. This needs to be fixed by fixing the domain issue described in #113.