LTHTR-DST / hdruk_avoidable_admissions

HDRUK Data Science Collaboration on Avoidable Admissions in the NHS.
https://lthtr-dst.github.io/hdruk_avoidable_admissions/
MIT License
6 stars 5 forks source link

Schema validation fails when column names don't match #4

Closed vvcb closed 1 year ago

vvcb commented 1 year ago

Bug in data.validate.py.

When column names in the pandera SchemaModel do not match the column names in the supplied data frame, this should be displayed in the error message as well as included as rows in the errors dataframe. All errors in the supplied dataframe should be reported as errors and no good rows should be returned.

However, as we are using the index column in ex.failure_cases dataframe, and this is nan for column name failures, the merge does not work and the entire dataframe is returned as being valid (unless there are other rows that fail from row level validation),

Expected behaviour should be as described in the first paragraph.