bips-hb / arfpy

Python implementation of adversarial random forests for density estimation and generative modelling
https://bips-hb.github.io/arfpy/
MIT License
23 stars 1 forks source link

ValueError: probabilities do not sum to 1 #4

Open zhao-zilong opened 1 week ago

zhao-zilong commented 1 week ago

The bug comes from this line: https://github.com/bips-hb/arfpy/blob/b5bf5de85e4ed71a0f2ed1fc1ed1ffafef4f7289/arfpy/arf.py#L313

And I think why it happens it is because in this line of code: you intentionally set come cvg values to zero: https://github.com/bips-hb/arfpy/blob/b5bf5de85e4ed71a0f2ed1fc1ed1ffafef4f7289/arfpy/arf.py#L224

Do you have some insights on this?

mnwright commented 3 days ago

Can you give a reproducible example for the ValueError?

We set those to zero because we cannot estimate variances for single-observation leaves. In R, we now avoid that completely by using "class-wise min.bucket", which avoids nodes with less than 2 (or whatever the user sets for the parameter) real observations (see https://github.com/imbs-hl/ranger/pull/721). But I think that is not possible with scikit-learn's RandomForestClassifier: There is min_samples_leaf but that can't be class-specific I think.