aai-institute / pyDVL

pyDVL is a library of stable implementations of algorithms for data valuation and influence function computation
https://pydvl.org
GNU Lesser General Public License v3.0
89 stars 9 forks source link

Implementation of Kwon and Zou Data-OOB: Out-of-bag Estimate as a Sim… #426

Closed BastienZim closed 10 months ago

BastienZim commented 10 months ago

…ple and Efficient Data Value ICML 2023 using pyDVL

Description

This PR adds the implementation of a data valuation method described in Kwon and Zou "Data-OOB: Out-of-bag Estimate as a Simple and Efficient Data Value" published at ICML 2023.

The notebook provided gives a comprehensive overview of the method, through examples, visualizations and point removal-evaluation.

No unit tests were added, as the notebook is testing the method. If-ever that is considered useful, I could write some.

Changes

Checklist

BastienZim commented 10 months ago

There still remains a problem raised by the type-checker that I did not manage to resolve. It concerns the return type of compute_data_oob. "Returning Any from function declared to return "ValuationResult[Any, Any]""

I would be interested in knowing the solution.

AnesBenmerzoug commented 10 months ago

Hi @BastienZim it looks good. It's almost ready there are just a few things missing: