Closed TaichiLi closed 4 years ago
You can try to train and apply an Imputer for each output column separately.
Alternatively, if you are ok with the default setting used in the SimpleImputer, you can also try to use the convenience function SimpleImputer.complete:
df = SimpleImputer.complete(data_frame=df)
Does that help?
@felixbiessmann I train an Imputer for each column, because I think the SimpleImputer's performance isn't better than Imputer. In addition, I have another question. How do I run with GPU?
hm, that code you shared should be using exactly the same featurizers, model and hyperparameters as the ones used by SimpleImputer.complete - it would be most interesting, if that gives you better results. The only difference is that the SimpleImputer is less typing. Would you mind sharing a comparison of the precision you're getting with either approach?
As for the GPU setup: You'll need an mxnet installation that works with GPUs. For a start you could try following the instructions on the main readme page of datawig.
If you run this (in your activated virtualenv)
wget https://raw.githubusercontent.com/awslabs/datawig/master/requirements/requirements.gpu-cu${CUDA_VERSION}.txt
pip install datawig --no-deps -r requirements.gpu-cu${CUDA_VERSION}.txt
rm requirements.gpu-cu${CUDA_VERSION}.txt
you should have the required dependency.
datawig should then per default use the available GPUs.
emm, wget is for Linux, but I use Win7.And there is no virtualenv in my computer.In addition, The SimpleImputer can't have parameters data_encoders.
you can also download the requirement files for GPU: https://github.com/awslabs/datawig/tree/master/requirements
you can install those requirements also without virtualenvs.
Are you trying to run GPU based model training on a Win7 machine? Not sure how good an idea that is.
As for the simple imputer comparison, you don't need to specify those data_encoders, it does that for you, and it does it exactly like you did it in the code you shared.
Just type
df = SimpleImputer.complete(data_frame=df)
Thanks very much.I try to use SimpleImputer, but I think if we use autocoder ,maybe we can get more accurate result.
hm, possibly. But the code you shared doesn’t use an auto encoder. In fact, none of the DataWig models use auto encoders, as far as I know.
On 22. Mar 2019, at 13:01, TaichiLi notifications@github.com wrote:
Thanks very much.I try to use SimpleImputer, but I think if we use autocoder ,maybe we can get more accurate result.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
When I predict missing value, I found that the datawig can't predict multiple data. For example,
This is my code, I want to get the 'b','d','f', but there will be a error:
I don't know how to solve it.I want to get some help.