Open popescu-v opened 1 year ago
In pandas >= 1.0.0 there is a StringDType
(see https://pandas.pydata.org/docs/reference/api/pandas.StringDtype.html#pandas.StringDtype), but AFAIU it's considered experimental for now.
:thumbsup: for Option 1 (IMHO, having a text type column should be an intrinsic property of the dataset, not just a particular usage detail; should we need to reinterpret the text column as categorical in a particular context, we should be able to create a new dataset with a new spec).
Description
Khiops 11 supports
Text
columns which have a specialized AutoML treatment as oppossed to normal strings(Categorical
). Sklearn predictors should also support this type.Questions/Ideas
Text
type?Dataset
propertytext_columns
with the names of the text fields ortable_text_columns
indexed by the table name and whose values are the names of the text columns (I prefer this one)Dataset
object will have all the necessary info to add the specified columns asText
fit
parametertable_text_columns
but as afit
optional parameterText
is part of the description of the datasetdict
interface should be maintained