AutoViML / AutoViz

Automatically Visualize any dataset, any size with a single line of code. Created by Ram Seshadri. Collaborators Welcome. Permission Granted upon Request.
Apache License 2.0
1.71k stars 197 forks source link

AutoViz not working with scikit-learn >= 0.24 on large datasets #44

Closed ggattoni closed 3 years ago

ggattoni commented 3 years ago

Starting from version 0.24, in scikit-learn it is raised an error (instead of a warning) when in KFold and StratifiedKFold it is passed a random_state without setting shuffle to True. When using AutoViz with a large dataset, in the function find_top_features_xgb, the KFold defined as kf = KFold(n_splits=n_splits, random_state=33) raises a ValueError and the overall auto visualization terminates with the message Not able to read or load file. Please check your inputs and try again.... If the intent is to shuffle the data in the KFold it should be added explicitly shuffle=True, because otherwise the data is not shuffled; on the other hand, if the intent is to not shuffle the data, the parameter random_state should be removed.

A simple dataset to use to reproduce the issue can be found on Kaggle at this URL.

AutoViML commented 3 years ago

@ggattoni I have removed that random state option and it should work now. Can you please test and coinfirm? Thanks AutoViz team