AutoViML / AutoViz

Automatically Visualize any dataset, any size with a single line of code. Created by Ram Seshadri. Collaborators Welcome. Permission Granted upon Request.
Apache License 2.0
1.71k stars 197 forks source link

JupyterLab/Pandas Dataframe/Bokeh leads to: KeyError: "[''] not in index" #52

Closed William-Lake closed 2 years ago

William-Lake commented 2 years ago

First off- thank you for creating this repository and for the latest Jupyter integration. I'm incredibly excited to use it, thank you for all the hard work!

Brief Description

Given .csv file test.csv:

name,some_string,some_boolean,some_number,some_amt
Kerry Bullock,RFH63GSB6XC,Yes,7,$92.87
Anika Stokes,BYU27VYT1LW,No,65,$48.20
Constance Jensen,KBF13GYN3FV,No,5,$14.28
Malcolm Alvarez,UUK28QNF8BG,No,90,$27.33
Clarke Hanson,KKT63JHC7KC,No,9,$28.52
David Ford,EDC73WSO8PU,No,2,$94.31
Abbot Combs,RRN89HGS1RT,Yes,71,$89.90

(test.csv)

When this code is ran in Jupyter Lab:

import pandas as pd
from autoviz.AutoViz_Class import AutoViz_Class

df = pd.read_csv('test.csv')

AV.AutoViz(
    filename="",
    dfte=df,
    depVar='',
    verbose=0,
    lowess=False,
    chart_format="bokeh",
)  

One bokeh chart is generated and two stack traces are displayed. The first one says:

...

~/anaconda3/envs/reporting/lib/python3.9/site-packages/autoviz/AutoViz_Holo.py in select_widget(each_cat)
    531                 width_size=15
    532                 #######  This is where you plot the histogram of categorical variable input as each_cat ####
--> 533                 conti_df = dft[[dep,each_cat]].groupby(each_cat).mean().reset_index()
    534                 row_ticks = dft[dep].unique().tolist()
    535                 color_list = next(colors)

...

KeyError: "[''] not in index"

AutoViz_holo.py, line 533

Then a chart is displayed, followed by the second stack trace:

...

~/anaconda3/envs/reporting/lib/python3.9/site-packages/autoviz/AutoViz_Holo.py in AutoViz_Holo(filename, sep, depVar, dfte, header, verbose, lowess, chart_format, max_rows_analyzed, max_cols_analyzed)
    192         ls_objects.append(drawobj42)
    193     else:
--> 194         drawobj41 = dfin[dep].hvplot(kind='bar', color='r', title='Histogram of Target variable').opts(
    195                         height=height_size,width=width_size,color='lightgreen', xrotation=70)
    196         drawobj42 = dfin[dep].hvplot(kind='kde', color='g', title='KDE Plot of Target variable').opts(

...

KeyError: ''

AutoViz_holo.py, line 194

In both cases it looks like the code is expecting dep to not be an empty string, and is failing when trying to use the empty string to select a column in the DataFrame.

Detail of the expected change(s) in behaviour

At first glance it looks like some additional checks of dep would help, but it also looks like the cats variable may have an empty string in it which may be the cause of the first stack trace. I'd need to do a deeper dive to get a clearer idea.

AutoViML commented 2 years ago

Hi @William-Lake 👍 You are absolutely correct. I have fixed the two places where this was happening. You should check by upgrading: pip install autoviz --upgrade

It should be fixed now. Please confirm.

Thanks AutoViML

William-Lake commented 2 years ago

Thanks @AutoViML worked like a charm. Have a nice weekend!