dilyabareeva / quanda

A toolkit for quantitative evaluation of data attribution methods.
https://quanda.readthedocs.io
MIT License
33 stars 0 forks source link

Add a Single Aggregated Score for Shortcut Detection (Domain Mismatch) #146

Closed gumityolcu closed 2 months ago

gumityolcu commented 2 months ago

New metric combining ideas from Koh et al. (IF original paper) and this paper.

1- Poison a subset of the images from a single class A, with a feature perturbation 2- Train model 3- Get rid of test samples that are already from class A (to make sure the model is using the shortcut) 4- Poison all remaining test samples 5- Take the test samples that are classified as class A (to make sure the model is using the shortcut) 6- Compute average attributions of poisoned and clean train samples

Steps 3 and 5, I am not sure of. I will first implement without those features, then they are easy to add (just filter test samples during evaluate call)

gumityolcu commented 2 months ago

Check out the paper for normalization of attributions