AlanInglis / vivid

This package is for visualising variable importance and variable interaction.
https://alaninglis.github.io/vivid/
20 stars 2 forks source link

The response has five or fewer unique values. Are you sure you want to do regression? #6

Closed jielab closed 3 months ago

jielab commented 3 months ago

Hi, there:

Thanks for providing a wonderful R package. Just the name itself make me happy :-)

The example given in the Vignette tested a continous trait. Now when I use randomForest(Y_yes_no ~ X +age+smoking+drinking, na.action=na.omit, data=dat1) to process a binary disease trait, I got the error message: The response has five or fewer unique values. Are you sure you want to do regression? Execution halted

Can you please let me know how to address this?

Thank you & best regards, jie

AlanInglis commented 3 months ago

Hi Jie,

Thanks for the positive feedback!.

The warning you are seeing is directly from the randomForest package. It's not actually related to the vivid package.

Looking at your model formula randomForest(Y_yes_no ~ X + age + smoking + drinking, na.action = na.omit, data = dat1), it seems you are working with a binary response variable (Y_yes_no). This suggests a classification task, not a regression task (which the warning is saying).

If, in fact, you do want to do classification, make sure Y_yes_no is a factor. This explicitly indicates that Y_yes_no is a categorical variable for classification. You would need to do something like this before running your random forest model:

dat1$Y_yes_no <- as.factor(dat1$Y_yes_no)

Hope that helps

jielab commented 3 months ago

Thanks!

as.factor() work amazingly!

If my phenotype is time-to-event variable (for cox regression), can I still use randomForest() ?

Best regards, jie

AlanInglis commented 3 months ago

Without seeing your data it is hard to comment and it is kind of beyond the scope of GitHub issues. But, random forests are not inherently designed for time-to-event or survival analysis problems... but there are extensions of the random forest algorithm that are suitable for survival analysis. I would suggest you check out the randomForestSRC package. You would end up with something that looks like this simple example:

# time-to-event variable is 'time', and event indicator is 'status'
rsf_model <- rfsrc(Surv(time, status) ~ ., data = data)

Hope that helps!!

AlanInglis commented 3 months ago

I'm going to close this issue now, as it is solved! Good luck!!!