Closed rcruzgar closed 3 years ago
Hi Rubén,
hm, this could be related to the fact that pandas converts your string columns to categorical columns, which raises an error, if you/datawig tries to set a value in a row to a value that is not in the allowed categories (because it was not previously observed).
A simple fix could be to force all your columns to be string columns instead of pandas-categorical ones, like:
for col in ['Provincia', 'Consumo', 'Potencia max', 'Comercializadora_encoded']:
df_copy[col] = df_copy[col].astype(str)
and then train the imputer.
Let me know if that works?
Best Felix
closing for now, feel free to reopen
Really helpful, Thanks.
Just to let people know, df[col].astype(str)
will convert any np.nan
values to "nan"
which are not recognised as missing and will thus not be imputed.
You may resolve this by converting "nan"
values back to np.nan
:
for col in df:
if df[col].str.contains("nan").any():
df[col].replace("nan", np.nan, inplace=True)
Hi,
I am trying to impute numeric values from one specific column (it's called 'Comercializadora_encoded', and it is now a numeric column because I previously encoded the original object-type column with LabelEncoder() from sklearn).
This is are the column types I would like to input:
--> Provincia 166203 non-null float64 --> Consumo 166203 non-null float64 --> Potencia max 166203 non-null float64
And this one the column to impute:
--> Comercializadora_encoded 163937 non-null object
This is my code:
And this is the error message I am getting:
I've also tried to use categorical columns as input columns, and to convert the output column into a category. Am I missing something?
Thank you very much. Regards, Rubén.