I also encountered a bug when running run_ml() on a dataset with categorical classes containing spaces in the label column. The classes that caused errors are "progressive supranuclear palsy" and "pathological aging". I can mitigate the bug by replacing " " with "_", to transform to labels "progressive_supranuclear_palsy" and "pathological_aging".
Using 'dx' as the outcome column.
Removed 1388/1917 (72.4%) of samples because of missing outcome value (NA).
result_amp <- run_ml(amp_data_preproc, 'glmnet')
Using 'dx' as the outcome column.
Error: At least one of the class levels is not a valid R variable name; This will cause errors when class probabilities are generated because the variables names will be converted to Alzheimer.Disease, control, pathological.aging, progressive.supranuclear.palsy . Please use factor levels that can be used as valid R variable names (see ?make.names for help).
It is expected that spaces can occur in the class labels data, or that preprocess_data() would automatically prepare the outcomes label.
I also encountered a bug when running
run_ml()
on a dataset with categorical classes containing spaces in the label column. The classes that caused errors are"progressive supranuclear palsy"
and"pathological aging"
. I can mitigate the bug by replacing" "
with"_"
, to transform to labels"progressive_supranuclear_palsy"
and"pathological_aging"
.Below are the codes to reproduce the bug.
amp_data_preproc <- preprocess_data(amp_ad_geneexp_dx, 'dx')$dat_transformed
result_amp <- run_ml(amp_data_preproc, 'glmnet')
It is expected that spaces can occur in the class labels data, or that
preprocess_data()
would automatically prepare the outcomes label.Related to https://github.com/openjournals/joss-reviews/issues/3073