alteryx / woodwork

Woodwork is a Python library that provides robust methods for managing and communicating data typing information.
https://woodwork.alteryx.com
BSD 3-Clause "New" or "Revised" License
145 stars 20 forks source link

Add Binary LogicalType #1256

Open thehomebrewnerd opened 2 years ago

thehomebrewnerd commented 2 years ago

As a developer, I wish I could use Woodwork to categorize columns that contain only one of two values with a logical type of Binary. This would be useful in setting up machine learning classification problems as users would clearly know that the target column with a Binary logical type contains only two values and they should structure the problem as a binary classification and not a multi-class classification.

The Binary logical type should have a dtype of category as it should work with multiple data types including strings, integers and doubles. Missing values should be allowed. This should not include boolean values of True and False as those should be recognized as a Boolean or BooleanNullable logical type. Binary should be a child type of Categorical.

Code Example

import pandas as pd
import woodwork as ww
from woodwork.logical_types import Binary

df = pd.DataFrame({"binary_col": ["yes", "no", "no", "yes"]})
df.ww.init()

assert isinstance(df.ww.logical_types["binary_col"], Binary)
gsheni commented 2 years ago

Part of this issue will to update our documentation, specifically this page:

gsheni commented 2 years ago

For this new Logical Type, we will not do type inference.