Homogenize interpretation of ValuationResult.counts

aai-institute / pyDVL

pyDVL is a library of stable implementations of algorithms for data valuation and influence function computation

https://pydvl.org

GNU Lesser General Public License v3.0

89 stars 9 forks source link

Homogenize interpretation of ValuationResult.counts #423

Closed kosmitive closed 9 months ago

kosmitive commented 10 months ago

How do we want to interpret the counts in ValuationResult. Currently there are four groups

Owen Sampling (Normal) evaluates c*max_q*n_samples and Owen Sampling (Antithetic) evaluates 2*c*max_q*n_samples. After c iterations each value has a count of c.
TMC-Shapley and Classwise Shapley has interpretation c = marginal evaluations. After c iterations each value has a count of c.
Beta Shapley has interpretation c = marginal evaluations. After c iterations each value's count is in the neighborhood of {c}.
Least Core has interpretation c = utility evaluations.

Personally I like the style of Beta Shapley and would vote for

[ ] Adapt Owen Sampling, TMC-Shapley and Classwise Shapley to have a count to match the number of utility evaluations (all permutation sampling based methods have n+1 evaluations with functional cache)

mdbenito commented 9 months ago

c = number of updates to the value. For marginal-based methods this means marginal evaluations. For other methods, it has other interpretations. That's just what it should be. I could imagine adding this information to the docstrings. That would make a lot of sense and is a good point.

I don't understand your distinction between tmcs and beta shapley. As you know, tmcs is a semivalue and all semivalues are implemented in the same way. Also, in beta-shap and compute_semivalues in general. c is exactly = marginal evaluations for an index. There is no "neighbourhood of" AFAICT.

I don't think your proposal can be implemented because, most importantly, counts are used for the variance estimates, so they must be number of updates to the values.

kosmitive commented 9 months ago

TMC and Classwise iterate over permutations, thus all values have the same number of counts unlike Beta. One value could have used 501 and the other 500 points. On top Beta always calculate both values (imagine cache disables, cache-hits might be very small in practice). Permutation sampling uses len(p)+1 utility evaluations but has len(p) marginal evaluations. Beta has 2 * len(samples) utility and len(samples) marginal evaluations.
Variance is a good point and we need to watch out for that
For Owen sampling: What about intetgrating n_samples factor in the counts, e.g c*n_samples?
The core has utility evaluations, which do not directly map to marginal evaluations and it is unclear how to make them have equal interpretation. But from what I see in the optimization problem: Each sampled set S is used as a constraint. Depending on the samples, v(1) might be included in 200 constraints, whereas v(2) is included in 202 constraints. Shouldn't we include that count into the ValuationResult? But for variance I don't have an interpretation here.

mdbenito commented 9 months ago

What I said was self contradictory and misleading. You are correct that counts does not mean marginal evaluations for all methods, although your analysis is not correct. There are as many marginal evaluations as updates to the value for all marginal methods, irrespective of sampler (Note btw that betashap is sampler independent). Caching doesn't play a role, as doesn't the number of utility evaluations. Yes, this affects performance, but it's not what ValuationResult.counts counts. It is used for variance estimates in MC. As such it is exactly "number of updates to the Monte Carlo estimate". This is why it is set to 1 in LC, and to the number of outer integral evaluations for Owen. Because it is used to compute the standard error of the mean, one cannot simply change it for Owen to include the inner integral because that is using a uniform quadrature rule, and there is no good number for LC.

Maybe what you are missing is a way of keeping track of utility evaluations for performance comparisons? We can't do this within Utility because it's evaluated in parallel. We could add a new field to the result with "number of times that an index has appeared as argument of the utility".

kosmitive commented 9 months ago

Okay got it. Indeed for the variance it doesn't make sense for LC and Owen.

Adding a second count variable might help, but I don't need access to it now. Currently algorithms terminate after a fixed number of steps and I pre-calculate how many evaluations happen to set them accordingly.