Closed tufanbt closed 1 year ago
Thanks for the bug report. Couple questions:
ndim
).Thanks for the information.
Quick questions:
0.5.20.post3
)?jupyter-notebook
, these might appear in the terminal that launched jupyter-notebook
rather than in the output cells).Yet another question: does you data by some chance have duplicated column names?
Actually turns out there was indeed an issue with imputing categorical columns with zeros in some cases when they are all missing. I've pushed a small update that should fix it - could you give it a try and see if you still experience the same issue?
pip install -U git+https://github.com/david-cortes/isotree.git
And also another fix for numerical columns being imputed with zeros when they are all-missing.
Thanks for all your efforts! Your fixes did work, now I see all NA values for features which were all NA in training set. I am closing this issue.
While using IsolationForest for imputation, although training data is all na for a feature (so no imputation can be done), transformed dataset (imputed test dataset which is nonoverlapping with the training, and also all na for that feature) includes mostly zeros(~93%) and some na values for the same feature. I could not replicate the issue with a smaller dataset, but maybe this description could help detect the problem. For reference, my training and test dataset have shapes (400000, 1000) and there are 3 categorical features with 10 to 40 levels. To sum up, IsolationForest's transform method introduces some zeros to "un-imputable" features.