Quantipy / quantipy

Python for people data
MIT License
66 stars 14 forks source link

Dataset.from_components infers all variables as string #1251

Closed Havlin closed 5 years ago

Havlin commented 5 years ago

When I read a pandas data-frame into data-set, all of my variables are saved as string, when many of them a clearly categorical variables that could be treated as a single or delimited set. Any suggestions? Perhaps i'm missing an argument somewhere.

For example, a column with Gender as header and Male or Female is read as string.

alextanski commented 5 years ago

Heya @Havlin! Can you please post the DataFrame.head() of your data? From your description it seems as if you are working with strings ('Male', 'Female' ?). If so, the behaviour of from_components() is alright. More so, there is no automatic conversion to categorical types (single, delimited set for example) in general. Types are only converted to the simple int, float and string, because the value objects cannot be resolved by only-data inputs alone.

alextanski commented 5 years ago

To add: You should be able to run DataSet.convert('variable_name', 'single') over all your variables and convert them to single categoricals.

Havlin commented 5 years ago

@alextanski ah I see. Thanks for the response. I misunderstood. I thought, when creating a dataset, the strings were converted into an appropriate datatype based on some kind of NLP inference. In otherwords, I thought it was automatic.

Yes thank you, I simply use DataSet.convert().