Open jeff-hernandez opened 3 years ago
@jeff-hernandez What would the code example of this look like?
import pandas as pd
import woodwork as ww
df = pd.read_csv("https://api.featurelabs.com/datasets/online-retail-logs-2018-08-28.csv")
# dataframe gets auto-initialized without calling `init`?
Entering the accessor for the first time would initialize Woodwork automatically instead of raising an error.
import pandas as pd
import woodwork as ww
df = pd.read_csv("https://api.featurelabs.com/datasets/online-retail-logs-2018-08-28.csv") # not initialized
df.ww # auto-initialized
Physical Type Logical Type Semantic Tag(s)
Column
order_id category Categorical ['category']
product_id category Categorical ['category']
description category Categorical ['category']
quantity int64 Integer ['numeric']
order_date datetime64[ns] Datetime []
unit_price float64 Double ['numeric']
customer_name category Categorical ['category']
country category Categorical ['category']
total float64 Double ['numeric']
cancelled bool Boolean []
In contrast to the current behavior, entering the accessor raises an error.
import pandas as pd
import woodwork as ww
df = pd.read_csv("https://api.featurelabs.com/datasets/online-retail-logs-2018-08-28.csv")
df.ww
WoodworkNotInitError: Woodwork not initialized for this DataFrame. Initialize by calling DataFrame.ww.init
As a user, I think it would be helpful to auto-initialize Woodwork when using the accessor. DataFrames contain enough information to create an initial schema based on the data types. There are several methods to update the schema after initializing.
Additionally, I think this behavior is similar to pandas behavior which does not require data type information to initialize a DataFrame -- you can modify the data types afterward. One possible drawback is losing the option to provide typing information before initializing, but it might be more of a pain point to make this initialization a requirement for the user.
As a developer, I think it would help simplify the code since it wouldn't require adding an initialization check to most (if not all?) of the methods in the Woodwork accessor. There may be some points I've overlooked here, but I think this is something worth considering.