aai-institute / pyDVL

pyDVL is a library of stable implementations of algorithms for data valuation and influence function computation
https://pydvl.org
GNU Lesser General Public License v3.0
109 stars 8 forks source link

Use quadrature rules in Owen Sampling #599

Open janosg opened 5 months ago

janosg commented 5 months ago

Owen sampling has roughly the following structure:

for idx in data.indices:
    for prob in np.linspace(0, 1, n_samples_outer):
        for _ in range(n_samples_inner):
            # draw samples and calculate utilites with and without index

The loop over probabilities controls the size of the samples. The fixed grid is a discrete approximation of an integral.

We should investigate the following questions:

My rough intuition is that very small sample sizes should be avoided because they yield very noisy value estimates and very large samples should be avoided because no single data point contributes a lot of value if the sample size is large.