LTHTR-DST / hdruk_avoidable_admissions

HDRUK Data Science Collaboration on Avoidable Admissions in the NHS.
https://lthtr-dst.github.io/hdruk_avoidable_admissions/
MIT License
6 stars 5 forks source link

Features ufunc 'bitwise_or' error #14

Closed MattStammers closed 1 year ago

MattStammers commented 1 year ago

We have been using the original SDV synthetic dataset to test the pipeline today. At the admitted data features stage when we ran:

good_f, bad_f = validate_dataframe(dff, AdmittedCareFeatureSchema)
print("Good dataframe has %d rows" % good_f.shape[0])
print("Bad dataframe has %d rows" % bad_f.shape[0])

We received the following stacktrace in response running v0.1.0-alpha of the pipeline

ufunc 'bitwise_or' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
No data will pass validation due to undefined error.See output above and please raise an issue on GitHub.
Good dataframe has 0 rows
Bad dataframe has 0 rows
vvcb commented 1 year ago

Thanks for raising this @MattStammers . This is likely due to the schema trying to adhere too closely to the specification which requires admidate to be a date rather than datetime. This quirk is documented here -> https://lthtr-dst.github.io/hdruk_avoidable_admissions/admitted_care_pipeline_example/#iterative-dq-fixes

One way to fix this would be to change the schema to accept datetime dtype. But for now can you use df.admidate.dt.date please.

vvcb commented 1 year ago

@MattStammers - 814f89b1230093f347bdb367165dcda2cded47de should fix this. Please reopen if you see this error again.