Some improvements to OOB notebook - Githubissues

aai-institute / pyDVL

pyDVL is a library of stable implementations of algorithms for data valuation and influence function computation

https://pydvl.org

GNU Lesser General Public License v3.0

89 stars 9 forks source link

Some improvements to OOB notebook #431

Closed mdbenito closed 9 months ago

mdbenito commented 9 months ago

Description

This PR adds some text and supporting functions to the OOB notebook.

Changes

Moves data prep to the supporting module
Adds a method to draw confidence intervals (normal and t)
Redoes a couple of plots using it
Adds random seed handling to compute_data_oob
Ensures reproducibility of the notebook by passing around the seed
Does some minor renaming in oob.py

I also sneaked in a couple of unrelated things:

An adjustment of font sizes in the api docs
Minor cosmetic changes here and there

Checklist

[ ] Wrote Unit tests (if necessary)
[ ] Updated Documentation (if necessary)
[ ] Updated Changelog
[ ] If notebooks were added/changed, added boilerplate cells are tagged with "tags": ["hide"] or "tags": ["hide-input"]

mdbenito commented 9 months ago

@BastienZim I've worked a bit on your notebook, let me know if you have comments. I was a bit surprised by the very different results that one can obtain with different seeds, often obtaining a degradation of performance with the removal of the worst 20% points. In the end I added complete randomization of the whole run including the splitting of the dataset to see what the true variance is. It is a lot more, but things are predictable. I also added random seed handling to compute_oob and did a couple minor things here and there