Automatically Visualize any dataset, any size with a single line of code. Created by Ram Seshadri. Collaborators Welcome. Permission Granted upon Request.
Apache License 2.0
1.71k
stars
197
forks
source link
AutoViz not working with scikit-learn >= 0.24 on large datasets #44
Starting from version 0.24, in scikit-learn it is raised an error (instead of a warning) when in KFold and StratifiedKFold it is passed a random_state without setting shuffle to True.
When using AutoViz with a large dataset, in the function find_top_features_xgb, the KFold defined as kf = KFold(n_splits=n_splits, random_state=33) raises a ValueError and the overall auto visualization terminates with the message Not able to read or load file. Please check your inputs and try again....
If the intent is to shuffle the data in the KFold it should be added explicitly shuffle=True, because otherwise the data is not shuffled; on the other hand, if the intent is to not shuffle the data, the parameter random_state should be removed.
A simple dataset to use to reproduce the issue can be found on Kaggle at this URL.
Starting from version 0.24, in
scikit-learn
it is raised an error (instead of a warning) when inKFold
andStratifiedKFold
it is passed arandom_state
without settingshuffle
toTrue
. When using AutoViz with a large dataset, in the functionfind_top_features_xgb
, theKFold
defined askf = KFold(n_splits=n_splits, random_state=33)
raises aValueError
and the overall auto visualization terminates with the messageNot able to read or load file. Please check your inputs and try again...
. If the intent is to shuffle the data in the KFold it should be added explicitlyshuffle=True
, because otherwise the data is not shuffled; on the other hand, if the intent is to not shuffle the data, the parameterrandom_state
should be removed.A simple dataset to use to reproduce the issue can be found on Kaggle at this URL.