HK3-Lab-Team / pytrousse

PyTrousse collects into one toolbox a set of data wrangling procedures tailored for composing reproducible analytics pipelines.
Apache License 2.0
0 stars 1 forks source link

Almost constant column with repeated floating value raises error #79

Open lorenz-gorini opened 3 years ago

lorenz-gorini commented 3 years ago

While testing preprocessing pipeline, RuntimeError was raised: RuntimeError: The column "feature_samenum_col" inferred type is floating, but only string and int categorical columns are supported.

The column contains a floating repeated value, along with few errors inserted (invalid values, substrings, ..). It is analyzed as categorical column by _columns_type property. Should we avoid to raise the error? Or we should consider these particular columns in a different way? (maybe columns with one floating repeated values are very rare because meaningless?)

lorenz-gorini commented 3 years ago

Also a column with integer and NaN seems to have inferred type = floating and it also raise the same error