As a developer, I wish I could use Woodwork to categorize columns that contain only one of two values with a logical type of Binary. This would be useful in setting up machine learning classification problems as users would clearly know that the target column with a Binary logical type contains only two values and they should structure the problem as a binary classification and not a multi-class classification.
The Binary logical type should have a dtype of category as it should work with multiple data types including strings, integers and doubles. Missing values should be allowed. This should not include boolean values of True and False as those should be recognized as a Boolean or BooleanNullable logical type. Binary should be a child type of Categorical.
Code Example
import pandas as pd
import woodwork as ww
from woodwork.logical_types import Binary
df = pd.DataFrame({"binary_col": ["yes", "no", "no", "yes"]})
df.ww.init()
assert isinstance(df.ww.logical_types["binary_col"], Binary)
As a developer, I wish I could use Woodwork to categorize columns that contain only one of two values with a logical type of
Binary
. This would be useful in setting up machine learning classification problems as users would clearly know that the target column with aBinary
logical type contains only two values and they should structure the problem as a binary classification and not a multi-class classification.The
Binary
logical type should have a dtype ofcategory
as it should work with multiple data types including strings, integers and doubles. Missing values should be allowed. This should not include boolean values ofTrue
andFalse
as those should be recognized as aBoolean
orBooleanNullable
logical type.Binary
should be a child type ofCategorical
.Code Example