PlantedML / randomPlantedForest

Random Planted Forest
http://plantedml.com/randomPlantedForest/
Other
3 stars 2 forks source link

Handling of missing values (NA) #6

Open jemus42 opened 2 years ago

jemus42 commented 2 years ago

Two possible options:

1) "We don't do NA, sorry": (Current behavior)

Missings in input data would cause an error or would be dropped (non-silently, to be safe(r)) via na.omit or similar. Could use an na_rm argument in rpf() and predict.rpf() for that purpose.

2) Handle NAs on the C++ level in whatever tree-ish way is suitable.

See also the Rcpp for everyone chapter on missings.

This is not a pressing issue for now since the implementation can be built and benchmarked under the assumption of complete data, but once we start considering a CRAN release we should at least have an opinion on the matter, I guess.

jemus42 commented 2 years ago

Currently rpf_impl throws an error if Y contains missings, which I think is reasonable.
For X however we'd still have to decide what to do.