LTHTR-DST / hdruk_avoidable_admissions

HDRUK Data Science Collaboration on Avoidable Admissions in the NHS.
https://lthtr-dst.github.io/hdruk_avoidable_admissions/
MIT License
6 stars 5 forks source link

Making edwaittime a nullable field #32

Closed quindavies closed 1 year ago

quindavies commented 1 year ago

edwaittime currently fails validation as field not nullable - some records leave before being seen my clinician. This needs to be accounted for

vvcb commented 1 year ago

edwaittime and timeined are currently set to int data type which causes problems with nan values even though they are set to be nullable.

Is there another column that allows us to identify these patients who left before being seen?

I will talk you through submitting your first pull request tomorrow! Can you please do a little bit of reading around this?

georgm8 commented 1 year ago

Quite a few of the columns in the EmergencyCareEpisodeSchema are both enforced to be of type np.int64 and also specify nullable=True. Because nan values are floats these validation rules are fighting each other in the case nan values exist in the data.

Pandas does now support a Nullable integer data type

What do you think about applying the dtype=pd.Int64Dtype() to the schema? With the caveat that the Nullable integer data type is still 'experimental'