Closed elopezfune closed 3 years ago
Looks like this just a warning, not an error, the code runs through and returns a dataframe, right?
It looks like there are some values in some column that are very rare. For those classes it's difficult to make high precision imputations.
To avoid low precision imputations, I'd recommend to set the precision_threshold argument to some higher value than 0.0, like for instance 0.8 when calling complete
. With a threshold of 0.8, you could expect a precision of 0.8 for the imputed values.
Values that are still missing then cannot be imputed with high enough precision.
Closing this for now, feel free to reopen if more problems come up.
I hope this message finds you well. I have been trying to impute missing values in my dataset using datawig library. However when I use datawig library to impute the missing values in my dataset. It imputes each and every other column while leaving behind two columns. Both of the columns are of dtype: object. However, it imputes other object columns. I had tried your recommendation by increasing the precision_threshold = 0.80 which also did not do any good. Any recommendation of making it better. Here is the code along with the visualization of my dataset:
df.tail(155).
The code to impute the missing values is as follows: import datawig df = datawig.SimpleImputer.complete(df, precision_threshold=0.80)
df.isnull().sum()
PassengerId 0
HomePlanet 0
CryoSleep 0
Cabin 199
Destination 0
Age 0
VIP 0
RoomService 0
FoodCourt 0
ShoppingMall 0
Spa 0
VRDeck 0
Name 200
Transported 0
dtype: int64
The missing values for the column named Cabin and Name were left and were not imputed for I do not know what reason. Also before applying datawig imputation the number of missing values in Name and Cabin column were the same. Any kind help would be appreciated Thanks!!!!
I have exactly the same problem. Installed datawig in my conda environment with python 3.7 (because higher versions result to problems with mxnet). I downgraded numpy because I got an error after installation:
ERROR: mxnet 1.4.0 has requirement numpy<1.15.0,>=1.8.2, but you'll have numpy 1.17.2 which is incompatible.
Next, I tried to impute 3 columns from the titanic dataset using
datawig.SimpleImputer.complete(df, precision_threshold = 0.8, inplace=True)
Got a value error:
ValueError: fill value must be in categories
So I forced all columns to string type and then converted "nan" values to np.nan. Then I ran again and only "Embarked" was imputed:
I repeated the same steps with precision_threshold = 0.1 and in Colab with the same result.
Is this how datawig should work or am I missing something?
I am working on some missing values problem with datawig (I am new to it), where from a total of 19 features in a pandas dataframe with missing data, only 4 of them are not fully imputed.
I do:
and I get the following error message:
What's happening and how could I impute the rest of the features?