Open zhao-zilong opened 1 week ago
Can you give a reproducible example for the ValueError?
We set those to zero because we cannot estimate variances for single-observation leaves. In R, we now avoid that completely by using "class-wise min.bucket", which avoids nodes with less than 2 (or whatever the user sets for the parameter) real observations (see https://github.com/imbs-hl/ranger/pull/721). But I think that is not possible with scikit-learn's RandomForestClassifier: There is min_samples_leaf
but that can't be class-specific I think.
The bug comes from this line: https://github.com/bips-hb/arfpy/blob/b5bf5de85e4ed71a0f2ed1fc1ed1ffafef4f7289/arfpy/arf.py#L313
And I think why it happens it is because in this line of code: you intentionally set come cvg values to zero: https://github.com/bips-hb/arfpy/blob/b5bf5de85e4ed71a0f2ed1fc1ed1ffafef4f7289/arfpy/arf.py#L224
Do you have some insights on this?