KhiopsML / khiops-python

The Python library of the Khiops AutoML suite
https://khiops.org
BSD 3-Clause Clear License
8 stars 1 forks source link

Support text types in sklearn predictors #39

Open popescu-v opened 1 year ago

popescu-v commented 1 year ago

Description

Khiops 11 supports Text columns which have a specialized AutoML treatment as oppossed to normal strings(Categorical). Sklearn predictors should also support this type.

Questions/Ideas

popescu-v commented 1 year ago

In pandas >= 1.0.0 there is a StringDType (see https://pandas.pydata.org/docs/reference/api/pandas.StringDtype.html#pandas.StringDtype), but AFAIU it's considered experimental for now.

popescu-v commented 1 year ago

:thumbsup: for Option 1 (IMHO, having a text type column should be an intrinsic property of the dataset, not just a particular usage detail; should we need to reinterpret the text column as categorical in a particular context, we should be able to create a new dataset with a new spec).