Add a Single Aggregated Score for Shortcut Detection (Domain Mismatch)

New metric combining ideas from Koh et al. (IF original paper) and this paper.

1- Poison a subset of the images from a single class A, with a feature perturbation 2- Train model 3- Get rid of test samples that are already from class A (to make sure the model is using the shortcut) 4- Poison all remaining test samples 5- Take the test samples that are classified as class A (to make sure the model is using the shortcut) 6- Compute average attributions of poisoned and clean train samples

Steps 3 and 5, I am not sure of. I will first implement without those features, then they are easy to add (just filter test samples during evaluate call)

dilyabareeva / quanda

Add a Single Aggregated Score for Shortcut Detection (Domain Mismatch) #146