Azure / Azure-TDSP-Utilities

Utilities and scripts developed as part of Microsoft's Team Data Science Process for productive data science
Creative Commons Attribution 4.0 International
373 stars 275 forks source link

BinaryClassification RMD doesn't properly create factors #20

Closed Pelonza closed 6 years ago

Pelonza commented 7 years ago

The current code in the BinaryClassification.rmd doesn't correctly use R syntax to create factor columns This is a giant problem for using the "auto" factor feature in the yaml files.

The cuplrit is line 118 in the B-C.rmd Currently it reads: if (!is.null(factorCols)) {for (i in 1:length(factorCols)) { trainDF[, factorCols[i]] <- make.names(as.factor(trainDF[, factorCols[i]])) }}

Change that line to: if (!is.null(factorCols)) {for (i in 1:length(factorCols)) { trainDF$factorCols[i] <-as.factor(trainDF$factorCols[i]) }}

The key difference there is that R doesn't know how to handle lists when converting to factors (it generates some sort errors)... and this avoids that entirely.


With this change I (and the other yaml-file fix) I was able to run the BinaryClassification rmd file...

deguhath commented 6 years ago

I have added this in the code as a comment. Thanks for pointing it out.