Closed hyojin0912 closed 4 years ago
Hi
I'm not sure I fully understand how to solve this, but here are a couple remarks:
pd.DataFrame(df)
is not needed, df is already a dataframeI would strongly recommend to check the metrics of your model, imputation results should be treated with care if the metrics indicate such a low accuracy as in your case. The predict function of SimpleImputer has a precision_threshold for categorical values, that ensures that you'll only get high precision imputations.
Hope this helps - feel free to reopen otherwise
Thanks for your kind reply.
But still things to ask.
for d in ['/gpu:2', '/gpu:3', '/gpu:4', '/gpu:5', '/gpu:6', '/gpu:7']: with tf.device(d):
I ask because I spend more than days in below state. There must be error..
2020-10-27 20:38:31,079 [INFO] Saved checkpoint to "result/dtip/impute/datawig/1000seed_imputer_model/C0344329/model-0036.params" 2020-10-27 20:38:31,136 [INFO] No improvement detected for 20 epochs compared to 1.0773332220560405 last error obtained: 5.240848921006545, stopping her 2020-10-27 20:38:31,136 [INFO] ========== done (33.13334774971008 s) fit model
I uploaded merged_cid.csv that I used in upper code as merged_cid (=df for "SimpleImputer.complete")
Thanks
The cross-entropy can still change when the accuracy doesn't, the cross-entropy is just a finer grained loss
the precision threshold is a standard parameter of SimpleImputer.predict
mxnet and tensorflow are usually not combined, you pick one or the other.
Thank you for fast reply.
I understand everythings.
Then, isn't there your guess about my zero accuracy? When seeing my metrics which contains lots of NA
hm, i'd probably use the SimpleImputer.fit/predict approach for single columns (like complete does, but writing the for loop through the columns yourself, because in complete, the metrics/log dir is deleted immediately) and then check the metrics files to see which columns can actually be predicted well enough.
Thanks for your nice package.
I have one question.
I am imputing large matrix (90,000 by 7,000).
And this matrix contain lots of NA (Over 80%).
Also include numerical value and zero or one categorical value.
Below is my code (After loading whole dataframe to impute) ` import datawig
I use "datawig.SimpleImputer.complete" for simplicity,
but is there any method to get neural network weight which used for imputation. And "datawig.SimpleImputer.complete" function how works for train and validation
I asking because there is no decrease of accuracy
2020-10-27 11:14:22,355 [INFO] Epoch[49] Batch [0-34] Speed: 1651.71 samples/sec cross-entropy=0.515578 C0040436-accuracy=0.000000 2020-10-27 11:14:22,675 [INFO] Epoch[49] Train-cross-entropy=0.667427 2020-10-27 11:14:22,675 [INFO] Epoch[49] Train-C0040436-accuracy=0.000000 2020-10-27 11:14:22,676 [INFO] Epoch[49] Time cost=0.657 2020-10-27 11:14:22,688 [INFO] Saved checkpoint to "result/dtip/impute/datawig/1000seed_imputer_model/C0040436/model-0049.params" 2020-10-27 11:14:22,723 [INFO] Epoch[49] Validation-cross-entropy=0.492388 2020-10-27 11:14:22,723 [INFO] Epoch[49] Validation-C0040436-accuracy=0.000000
Thanks
Hyojin