LTHTR-DST / hdruk_avoidable_admissions

HDRUK Data Science Collaboration on Avoidable Admissions in the NHS.
https://lthtr-dst.github.io/hdruk_avoidable_admissions/
MIT License
6 stars 5 forks source link

Missing comorbidities for APC dataset #41

Closed ccarenzoIC closed 1 year ago

ccarenzoIC commented 1 year ago

Hi,

For the APC data, the comorbidities feature is missing. This is a "Yes"/"No" variable calculated across all secondary diagnosis variables (i.e. if all of diag_02 to diag_20 are empty it would have value "no", if any of diag_02 to diag_20 contain a code then it would have value "yes"). https://docs.google.com/document/d/1uvywYJrwYv7K3jMkpPPj0OOdTove9yXHPQeydD406fQ/edit#

Happy to contribute if this is not already being done,

Thanks

vvcb commented 1 year ago

Thank you @ccarenzoIC . Will be great if you have some time to contribute to this 🙏 .

I left this out because I was not sure if the calculation of comorbdities described in the specification document was quite what comorbidities mean in clinical practice. Not all associated conditions would be classed as comorbidities. But, it is possible that Sheffield have used their definition from somewhere or have taken a pragmatic approach to quantifying this - which is fine as long as the final report makes this clear, which I am confident it will.

One of the problems associated with a simple count is that this makes the calculation more susceptible to the effects of depth of coding i.e. a coding department that codes every minor thing will have consistently more 'comorbid' patients than a department that only codes the major issues. But, I guess one could argue the same for everything else! :-)

For future work or other projects please have a look at https://github.com/vvcb/comorbidipy/ which allows the calculation of multiple comorbidity scores from ICD10 codes. We have this deployed at LTH as an API.

But, for this project and given the tight deadlines, agree that it will be best to stick to the Sheffield docs.

georgm8 commented 1 year ago

@ccarenzoIC - just wanted to double check if you'd already started working on this? I just happen to be working on this at the moment and just spotted you'd already flagged this as an issue! If you've not started on this already I should be able to submit a pull request by the end of the day

ccarenzoIC commented 1 year ago

@georgm8 Given the tight deadlines I've done it the simple way. Just modified the def _procedures fun as follows: df["procedures"] = np.where( ( df.filter(regex="opertn_[0-1][0-9]$") .replace({"X99[8-9]|[OYZ][0-9]+|-": np.nan}, regex=True) .count(axis=1) )!=0, "Yes","No")

georgm8 commented 1 year ago

42 pull request submitted. Have made modifications to the _procedures() function and added _comorbidities() to pipeline along with validation for the new columns

vvcb commented 1 year ago

Fixed by #42