Closed georgm8 closed 1 year ago
Pull request #46
Thanks for reporting this @georgm8 . The data expected in this column are SNOMED codes which are integers rather than strings. feature_maps.py generates the map between the SNOMED codes as integers and string categories.
Not sure why that error appears. Looking at this on my phone at the moment. Will check this evening and merge.
Regarding the SNOMED codes for missing data, I agree that they should go in feature_maps along with 0 which is already in there I think.
@georgm8, I have rerun v0.3.1 on the LTH data and don't get this error. The following is the truncated output of good.dtypes
after the first validation. There shouldn't really be any Int64
dtypes unless you are coercing columns into this in a previous step.
Is it possible that this may have been introduced to allow nan
in SNOMED columns instead of assigning 0 or one of the allowed values for missing or unknown values.
Can you please check and close this issue if this explains it?
Also please see https://github.com/pandas-dev/pandas/issues/45729.
column | dtype |
---|---|
patient_id | int64 |
visit_id | int64 |
townsend_score_quintile | int64 |
gender | object |
activage | int64 |
ethnos | object |
accommodationstatus | int64 |
procodet | object |
edsitecode | object |
eddepttype | object |
edarrivalmode | int64 |
edattendcat | object |
edattendsource | int64 |
edarrivaldatetime | datetime64[ns, UTC] |
edwaittime | float64 |
edacuity | int64 |
edchiefcomplaint | int64 |
edcomorb_01 | int64 |
eddiag_NN | int64 |
edentryseq_NN | int64 |
eddiagqual_NN | int64 |
edinvest_NN | int64 |
edtreat_NN | int64 |
timeined | float64 |
disstatus | int64 |
edattenddispatch | int64 |
edrefservice | int64 |
Thanks - you're absolutely right - I forgot to remove the Nullable Integer data type I was testing out earlier. No error with int64
data types. Closing.
replace_values() function throws the error
TypeError: Invalid value 'ERROR:Unmapped - Not In Refset' for dtype Int64
as it is trying to replace values in a column with the string value in the variableother
in instances where the Pandas Series data is not a string.Quick fix suggested is to change the series to a string and also replace dictionary keys with strings.