Closed aloysius-lim closed 11 years ago
Fixing this causes instability in the error estimates in the first several trees. This leads to ugly plots of training error, which jumps about quite a bit before settling into a curve. Thus I have decided not to fix this, unless there is strong demand for it.
Currently, error estimates are computed as a proportion of all examples. However, at early stages of building the forest, some examples have never been out-of-bag, thus they contribute to the error score. It would be unfair to count these examples as "errors".
Instead, error estimates should be computed as a proportion of all examples that have been out-of-bag at least once.