Closed mglowacki100 closed 2 years ago
Hi @mglowacki100 👍
Your observations are correct. However, I use pd.read_csv('ts.csv', parse_dates=True)
when reading your file and it does not detect a date-time column. What can I do? I am sorry. In the second case, when you send in a dataframe, I detect a date-time column easily but I need at least two numeric variables to plot a time series graph. The reason is I don't want to clutter the screen with hundreds of time series plots (when there are hundreds of numeric variables). Hence I have to cut down the number of graphs. But if you feel that this is truly something that must be handled by AutoViz, I can reconsider my position. Let me know. I will leave this issue open for a few days for you to respond. Thanks
Ram
Hi, Thanks for explanations :)
ad1.
Yeah, parse_dates
sometimes behaves oddly, but I'm able to convert and detect date column with this snippet:
df = df.apply(lambda col: pd.to_datetime(col, errors='ignore')
if col.dtypes == object
else col,
axis=0)
df.dtypes
time datetime64[ns]
values float64
dtype: object
Of course, it is easy to do it in "isolation" and could be hard to incorporate to codebase, so maybe just a change in documentation that date column is not always detected from file and dataframe should be preferred? Btw. have you considered separate interfaces for file and dataframe use-cases, namely: AV.AutoViz_from_file(...), AV.AutoViz_from_df(...) ?
ad2.
Maybe the solution is to add flag disable_time_series
with True
default? With info that 'False' could generate a lot of charts.
Hi @mglowacki100 👍
That suggestion of yours is dangerous 👎
df = df.apply(lambda col: pd.to_datetime(col, errors='ignore') if col.dtypes == object else col, axis=0)
It would turn most object columns into gibberish since many would become date-time columns without being one. The reason AutoViz works for most use cases is that I judiciously avoid applying transforms on user supplied data but rather tease out what their column type might be to visualize them.
However, I agree with your suggestion that:
a change in documentation that date column is not always detected from file and dataframe should be preferred
I will make the change in the README file and make that clear.
Additionally, you had made a suggestion:
Maybe the solution is to add flag disable_time_series with True default
which I am not in favor of since it would add too many flags to make the API very hard to understand. Similarly, your suggestion to have different AutoViz for dataframe and files will make it too complicated in my view.
Remember: the goal of AutoViz is to visualize your data set, any size in a single line of code.
Please keep the suggestions coming. Appreciate your passion in making the product better and work for everyone. Ram
Here is minimal reproducible example with google colab:
import pandas as pd
AV = AutoViz_Class()
df = pd.DataFrame({'time': ['2020-01-15', '2020-02-15', '2020-03-15', '2020-04-15', '2020-05-15'], 'values': [1.0,2.5,3.2,4.2,5.6]}) df['time'] = pd.to_datetime(df['time']) df.to_csv('ts.csv', index=False)
dft = AV.AutoViz("ts.csv", verbose=2)
hape of your Data Set loaded: (5, 2) ############## C L A S S I F Y I N G V A R I A B L E S #################### Classifying variables in data set... Data Set Shape: 5 rows, 2 cols Data Set columns info:
values: 0 nulls, 5 unique vals, most common: {3.2: 1, 5.6: 1}
Numeric Columns: ['values'] Integer-Categorical Columns: [] String-Categorical Columns: [] Factor-Categorical Columns: [] String-Boolean Columns: [] Numeric-Boolean Columns: [] Discrete String Columns: [] NLP text Columns: [] Date Time Columns: [] ID Columns: ['time'] Columns that will not be considered in modeling: [] 2 Predictors classified... This does not include the Target column(s) 1 variables removed since they were ID or low-information variables List of variables removed: ['time'] No categorical or numeric vars in data set. Hence no bar charts. Time to run AutoViz (in seconds) = 0.562
dft = AV.AutoViz("", dfte=df, verbose=2) Shape of your Data Set loaded: (5, 2) ############## C L A S S I F Y I N G V A R I A B L E S #################### Classifying variables in data set... Data Set Shape: 5 rows, 2 cols Data Set columns info:
values: 0 nulls, 5 unique vals, most common: {3.2: 1, 5.6: 1}
Numeric Columns: ['values'] Integer-Categorical Columns: [] String-Categorical Columns: [] Factor-Categorical Columns: [] String-Boolean Columns: [] Numeric-Boolean Columns: [] Discrete String Columns: [] NLP text Columns: [] Date Time Columns: ['time'] ID Columns: [] Columns that will not be considered in modeling: [] 2 Predictors classified... This does not include the Target column(s) No variables removed since no ID or low-information variables found in data set Could not draw Date Vars No categorical or numeric vars in data set. Hence no bar charts. Time to run AutoViz (in seconds) = 0.408