autonlab / auton-survival

Auton Survival - an open source package for Regression, Counterfactual Estimation, Evaluation and Phenotyping with Censored Time-to-Events
http://autonlab.github.io/auton-survival
MIT License
315 stars 74 forks source link

ValueError: could not convert string to float: 'Rural Adj' #128

Open sbpatel2009 opened 1 year ago

sbpatel2009 commented 1 year ago

I am attempting to fit the Preprocessor on training data that only includes categorical variables. It appears that when I pass an empty list to the num_feats parameter, the Preprocessor attempts to convert all columns to floats and returns an error.

This code:

cat_feats = ['Rural']
num_feats = []

preprocessor = Preprocessor()
preprocessor.fit(data=X, cat_feats=cat_feats, num_feats=num_feats)

...returns this error:

ValueError: could not convert string to float: 'Rural Adj'

I may be missing it, but I don't see how to handle this issue in the documentation. Does this class require there to be numerical features?

matteo4diani commented 10 months ago

Hi @sbpatel2009, sorry for the late reply and thanks for contributing to auton-survival 🙂

You may already have figured it out by yourself, but the reason for that error lies in this bit of code: https://github.com/autonlab/auton-survival/blob/5dde465f7223601717abddc1d075e837707c403b/auton_survival/preprocessing.py#L209-L212

As you can see from the code, the preprocessor assumes that if no num_feats are provided, all feats are num_feats. I'm going to work on this and other bugs in the next week(s). If you want to contribute a fix, you are more than welcome 😄

A workaround can be to add a dummy numerical column to the input DataFrame:

from auton_survival.preprocessing import Preprocessor
import pandas as pd

cat_feats = ['Rural']
num_feats = ['Dummy']

X = pd.DataFrame({'Rural': ['yes', 'no', 'maybe'], 'Dummy': [0, 0, 0]})

preprocessor = Preprocessor()
X = preprocessor.fit_transform(data=X, cat_feats=cat_feats, num_feats=num_feats)

X = X.drop(columns=['Dummy'])

print(X)
sbpatel2009 commented 10 months ago

Thank you, Matteo! That is a clever workaround! I just used the one hot encoder in scikit learn.

Best, Snehal

On Thu, Nov 9, 2023 at 9:41 AM Matteo Fordiani @.***> wrote:

Hi @sbpatel2009 https://github.com/sbpatel2009, sorry for the late reply and thanks for contributing to auton-survival 🙂

You may already have figured it out by yourself, but the reason for that error lies in this bit of code: https://github.com/autonlab/auton-survival/blob/5dde465f7223601717abddc1d075e837707c403b/auton_survival/preprocessing.py#L209-L212

As you can see from the code, the preprocessor assumes that if no num_feats are provided, all feats are num_feats.

A workaround can be to add a dummy numerical column to the input DataFrame, although while experimenting with this example I found another bug that I detailed here #133 https://github.com/autonlab/auton-survival/issues/133

from auton_survival.preprocessing import Preprocessorimport pandas as pd

cat_feats = ['Rural']num_feats = ['Dummy'] X = pd.DataFrame({'Rural': ['yes', 'no', 'maybe'], 'Dummy': [0, 0, 0]}) preprocessor = Preprocessor()X = preprocessor.fit_transform(data=X, cat_feats=cat_feats, num_feats=num_feats) print(X)

— Reply to this email directly, view it on GitHub https://github.com/autonlab/auton-survival/issues/128#issuecomment-1804070933, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGAPI626SXUOM66H3NLPCYDYDT2RHAVCNFSM6AAAAAA3RVXYOGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMBUGA3TAOJTGM . You are receiving this because you were mentioned.Message ID: @.***>