WinVector / pyvtreat

vtreat is a data frame processor/conditioner that prepares real-world data for predictive modeling in a statistically sound manner. Distributed under a BSD-3-Clause license.
https://winvector.github.io/pyvtreat/
Other
120 stars 8 forks source link

categorical variables #15

Closed michael135 closed 4 years ago

michael135 commented 4 years ago

If the categorical column appears to have only a numeric variables (like: 5, 7, 8, 1). What is the way to specify it to vtreat.NumericOutcomeTreatment?

Or the most simple way is to convert numeric values (categorical column) to some kind of strings?

JohnMount commented 4 years ago

vtreat uses the column type to determine the processing. A column that you consider categorical, but happens to have only values like 1, 2, 3 will likely be coded as numeric if it came from a csv reader (as csv files don't have types, the reader just guesses at types). My advice is: always examine column types using the Pandas .dtypes attribute and convert columns to string using a command such as data['ColumnID'] = data['ColumnID'].astype(str).