AutoViML / AutoViz

Automatically Visualize any dataset, any size with a single line of code. Created by Ram Seshadri. Collaborators Welcome. Permission Granted upon Request.
Apache License 2.0
1.71k stars 197 forks source link

[bug] problem with time series charts #46

Closed mglowacki100 closed 2 years ago

mglowacki100 commented 2 years ago

Here is minimal reproducible example with google colab:

  1. Date time column is no recognized, when input is file:
    
    !pip install autoviz
    from autoviz.AutoViz_Class import AutoViz_Class

import pandas as pd

AV = AutoViz_Class()

df = pd.DataFrame({'time': ['2020-01-15', '2020-02-15', '2020-03-15', '2020-04-15', '2020-05-15'], 'values': [1.0,2.5,3.2,4.2,5.6]}) df['time'] = pd.to_datetime(df['time']) df.to_csv('ts.csv', index=False)

dft = AV.AutoViz("ts.csv", verbose=2)

hape of your Data Set loaded: (5, 2) ############## C L A S S I F Y I N G V A R I A B L E S #################### Classifying variables in data set... Data Set Shape: 5 rows, 2 cols Data Set columns info:

2. When input is dataframe - chart is not generated, but date time column is recognized:

dft = AV.AutoViz("", dfte=df, verbose=2) Shape of your Data Set loaded: (5, 2) ############## C L A S S I F Y I N G V A R I A B L E S #################### Classifying variables in data set... Data Set Shape: 5 rows, 2 cols Data Set columns info:


Expected result: chart with date on x-axis, and value on y-axis.
AutoViML commented 2 years ago

Hi @mglowacki100 👍 Your observations are correct. However, I use pd.read_csv('ts.csv', parse_dates=True) when reading your file and it does not detect a date-time column. What can I do? I am sorry. In the second case, when you send in a dataframe, I detect a date-time column easily but I need at least two numeric variables to plot a time series graph. The reason is I don't want to clutter the screen with hundreds of time series plots (when there are hundreds of numeric variables). Hence I have to cut down the number of graphs. But if you feel that this is truly something that must be handled by AutoViz, I can reconsider my position. Let me know. I will leave this issue open for a few days for you to respond. Thanks Ram

mglowacki100 commented 2 years ago

Hi, Thanks for explanations :)

ad1. Yeah, parse_dates sometimes behaves oddly, but I'm able to convert and detect date column with this snippet:

df = df.apply(lambda col: pd.to_datetime(col, errors='ignore') 
              if col.dtypes == object 
              else col, 
              axis=0)

df.dtypes
time      datetime64[ns]
values           float64
dtype: object

Of course, it is easy to do it in "isolation" and could be hard to incorporate to codebase, so maybe just a change in documentation that date column is not always detected from file and dataframe should be preferred? Btw. have you considered separate interfaces for file and dataframe use-cases, namely: AV.AutoViz_from_file(...), AV.AutoViz_from_df(...) ?

ad2. Maybe the solution is to add flag disable_time_series with True default? With info that 'False' could generate a lot of charts.

AutoViML commented 2 years ago

Hi @mglowacki100 👍 That suggestion of yours is dangerous 👎 df = df.apply(lambda col: pd.to_datetime(col, errors='ignore') if col.dtypes == object else col, axis=0) It would turn most object columns into gibberish since many would become date-time columns without being one. The reason AutoViz works for most use cases is that I judiciously avoid applying transforms on user supplied data but rather tease out what their column type might be to visualize them.

However, I agree with your suggestion that: a change in documentation that date column is not always detected from file and dataframe should be preferred

I will make the change in the README file and make that clear.

Additionally, you had made a suggestion: Maybe the solution is to add flag disable_time_series with True default which I am not in favor of since it would add too many flags to make the API very hard to understand. Similarly, your suggestion to have different AutoViz for dataframe and files will make it too complicated in my view.

Remember: the goal of AutoViz is to visualize your data set, any size in a single line of code.

Please keep the suggestions coming. Appreciate your passion in making the product better and work for everyone. Ram