forestry-labs / Rforestry

https://forestry-labs.github.io/Rforestry/
34 stars 10 forks source link

Allow all NA or same value columns with scaling #135

Open theo-s opened 1 year ago

theo-s commented 1 year ago

When one of the entries of colSd doesn't exist (constant values, or too many missing values), we we should describe why this is a problem instead of using the current generic error message that just says the colSd doesn't exist.

edwardwliu commented 1 year ago

We should move the scaling logic for calculating column SD and means to the shared C++ layer. In addition, we should make the functional change that columns with all NA or same values do not result in an error. Instead, there should just be a warning logged for the user, such as "Features {a}, {b}, {c} have the same column value so will not be split on when fitting the forest." The training should still complete without errors.