Closed irisshen926 closed 5 years ago
Sorry for a delayed reply. Hope you are well, Iris!
In easyml, you can specify variables as (binary) categorical variables using the categorical_variables
input argument. Check out this link for an example: https://ccs-lab.github.io/easyml/articles/titanic.html.
You can also possibly code Race
into multiple binary variables (e.g., Asian = 1 or 0, White = 1 or 0, African_American = 1 or 0) and enter them as categorical variables in easyml.
Best, Young
My colleague and I have been using Easyml package to analysis out data. We were wondering how the algorithm handles categorical variable (for example, race). We have coded Race to have 6 different values. However, they are not on a continuous scale. We looked into the source code to try to figure out how the algorithm handles categorical variables. For glmnet, by default the preprocess method is preprocess_scale(), which only scale the numerical variable and leave the categorical variables unchanged.
set_preprocess <- function(preprocess = NULL, algorithm) { if (is.null(preprocess)) { if (algorithm == "glmnet") { preprocess <- preprocess_scale } else if (algorithm == "random_forest") { preprocess <- preprocess_identity } else if (algorithm == "support_vector_machine") { preprocess <- preprocess_scale } }
preprocess }
In Preprocess.R if (is.null(mask)) {
No categorical variables
We end up re-coding Race into a binary variable (white and non-white) since we only had a small amount of subjects who are in non-white categories. However, we just wanted to ask to see how should we handle categorical variables such as Race in the future.
Thank you so much for your help!