AutoViML / AutoViz

Automatically Visualize any dataset, any size with a single line of code. Created by Ram Seshadri. Collaborators Welcome. Permission Granted upon Request.
Apache License 2.0
1.71k stars 197 forks source link

Data Viz for training data after making the split #42

Closed arora123 closed 3 years ago

arora123 commented 3 years ago

We should explore data after making a train-test split to avoid data leakage. How can I supply a data frame (training data only) to df.Autoviz() function? I tried supplying dataframe and leaving filename as an empty string but it's not giving me charts.

My Code:

import pandas as pd
from sklearn.model_selection import train_test_split

df = pd.read_csv('https://raw.githubusercontent.com/arora123/Data/master/WA_Fn-UseC_-Telco-Customer-Churn.csv')

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state =1)

!pip install autoviz
# To import AutoViz_Class from autoviz-AutoViz_Class

from autoviz.AutoViz_Class import AutoViz_Class 

#To initialize class
av = AutoViz_Class()

av.AutoViz('', sep=',', depVar='Churn', dfte=pd.DataFrame(x, y), 
           header=1, verbose=1, lowess=False, chart_format='svg',)

Output

Shape of your Data Set: (7043, 20) ############## C L A S S I F Y I N G V A R I A B L E S #################### Classifying variables in data set... Not able to read or load file. Please check your inputs and try again...

AutoViML commented 3 years ago

@arora123 : Here is how I would modify your code to run AutoViz. I have tested and it works on my notebook. So please confirm:

`import pandas as pd from sklearn.model_selection import train_test_split

df = pd.read_csv('https://raw.githubusercontent.com/arora123/Data/master/WA_Fn-UseC_-Telco-Customer-Churn.csv')

train, test = train_test_split(df, test_size=0.2, random_state =1) df.shape

!pip install autoviz

To import AutoViz_Class from autoviz-AutoViz_Class

from autoviz.AutoViz_Class import AutoViz_Class

To initialize class

av = AutoViz_Class()

av.AutoViz('', sep=',', depVar='Churn', dfte=train, header=1, verbose=1, lowess=False, chart_format='svg',) `