LTHTR-DST / hdruk_avoidable_admissions

HDRUK Data Science Collaboration on Avoidable Admissions in the NHS.
https://lthtr-dst.github.io/hdruk_avoidable_admissions/
MIT License
6 stars 5 forks source link

Multiple SNOMED Refset member lists incomplete #27

Open quindavies opened 1 year ago

quindavies commented 1 year ago

SNOMED code for Referred by self (1991000124105) is missing from the feature map, needed for edattendsource to pass validation

Code is in the Emergency care attendance source simple reference set

vvcb commented 1 year ago

@quindavies , can you provide some more details regarding all the columns that fail validation because the code lists are incomplete, please? And the cause and steps to mitigate as discussed earlier, please?

@dfleming9 , email coming your way to address this using the terminology server.

In the meantime, we may have to ignore validation errors on SNOMED columns.

quindavies commented 1 year ago

@quindavies , can you provide some more details regarding all the columns that fail validation because the code lists are incomplete, please? And the cause and steps to mitigate as discussed earlier, please?

@dfleming9 , email coming your way to address this using the terminology server.

In the meantime, we may have to ignore validation errors on SNOMED columns.

Sure, columns listed below

edattendsource edchiefcomplaint eddiag edinvest edrefservice edtreat

Cause is the Snowstorm API in nhsdd_generator.py which is an old version so not all codes in the current refset are being pulled through Steps we're looking at are using a different API (NHS Terminology Server - FHIR APIs)

dfleming9 commented 1 year ago

Got a call arranged in the morning with @quindavies to work it through. Can I just make sure that the authoritative source of validation rules is HDRUK Data Processing V1? because this doesn't include 1991000124105 it includes 507291000000100 ('instead'). My reading of the ECDS_ETOS_v4 is that 1991000124105 is valid from 1st of October 2022 onwards and 507291000000100 is valid from (1st of Sep 2017-31st of March 2023) ... so I think both codes could legitimately appear in the data in the study period (I think 1082331000000106 should also be checked for the same reason because it was valid from the 1st of October 2022 as well and also isn't present in HDRUK Data Processing V1)

vvcb commented 1 year ago

This emphasises the importance of:

  1. a good knowledge of SNOMED codes
  2. robust validation of data to ensure that we don't miss an entire cohort of patients in the analysis simply by missing one or more codes because of quirks around time periods when these codes are valid/invalid and different codes for very similar concepts with some appearing in a refset and others not.
vvcb commented 1 year ago

@dfleming9 , HDRUK DATA Processing V1 is indeed the authoritative source and along with the other Sheffield docs has proven invaluable - but it does have some issues that need addressing which we should feedback to the Sheffield team at the next analyst meeting.

GitHub is proving invaluable in keeping track of some of these issues (and discussing them).