LTHTR-DST / hdruk_avoidable_admissions

HDRUK Data Science Collaboration on Avoidable Admissions in the NHS.
https://lthtr-dst.github.io/hdruk_avoidable_admissions/
MIT License
6 stars 5 forks source link

Validator not picking up ACSC's for avoidable admissions #33

Closed MattStammers closed 1 year ago

MattStammers commented 1 year ago

I have tried overwriting all the first diagnoses with codes like cellulitis '128045006' and the validator is not picking these up flagging everything as non-ACSC on the final step.

vvcb commented 1 year ago

@MattStammers , thank you for reporting this.

Can you please add some more details to the first post to describe how one may replicate this problem? Based on the WhatsApp messages, I understand that there may be a problem with the ACSC mapping of diag_01 variable in the admitted care dataset.

If this is correct, then this will affect most of the generated tables.

There may be a problem with how this feature is built. Would you mind having a look at the following sections of code to see what may be broken?

And definitely worth checking the Sheffield mapping spec google sheet - link can be reconstructed from the second code section above. feature_maps.py directly reads this google sheet and uses the data for the mapping.

Happy to look at a PR if there is a bug at this end.

MattStammers commented 1 year ago

Right @georgm8 has looked at the code and found the issue

MicrosoftTeams-image

our ICD codes had a full stop in them which was not being picked up. Presently it was replacing this with a space. Either we need consensus on the format of the ICD10 codes or a regex validator to be added. I think @georgm8 is happy to do this?

georgm8 commented 1 year ago

Happy to create a validator once we have consensus on how we want the ICD-10 codes to look

vvcb commented 1 year ago

The Sheffield spec uses a period . after the first three characters if there are more than 3 characters. At LTH, we don't have a . in the raw data and found that the simplest thing to do was to remove this from the Sheffield spec google sheet.

For _acuteadmits dataset,

My view is that we get rid of the . to keep things simple.

~Just found that we are not doing this in the ECDS dataset.~

Ignore this as the ECDS mapping is SNOMED to category and should not be affected. The ECDS mapping google docs however also has ICD codes in it and worth exploring further.

vvcb commented 1 year ago

@MattStammers , feel free to close this issue if #35 by @georgm8 addresses this.

@quindavies , we will have to run the new changes in #35 against out admitted care dataset and make sure there are no surprises.