Open theo-s opened 1 year ago
We should move the scaling logic for calculating column SD and means to the shared C++ layer. In addition, we should make the functional change that columns with all NA or same values do not result in an error. Instead, there should just be a warning logged for the user, such as "Features {a}, {b}, {c} have the same column value so will not be split on when fitting the forest." The training should still complete without errors.
When one of the entries of
colSd
doesn't exist (constant values, or too many missing values), we we should describe why this is a problem instead of using the current generic error message that just says the colSd doesn't exist.