New metric combining ideas from Koh et al. (IF original paper) and this paper.
1- Poison a subset of the images from a single class A, with a feature perturbation
2- Train model
3- Get rid of test samples that are already from class A (to make sure the model is using the shortcut)
4- Poison all remaining test samples
5- Take the test samples that are classified as class A (to make sure the model is using the shortcut)
6- Compute average attributions of poisoned and clean train samples
Steps 3 and 5, I am not sure of. I will first implement without those features, then they are easy to add (just filter test samples during evaluate call)
New metric combining ideas from Koh et al. (IF original paper) and this paper.
1- Poison a subset of the images from a single class A, with a feature perturbation 2- Train model 3- Get rid of test samples that are already from class A (to make sure the model is using the shortcut) 4- Poison all remaining test samples 5- Take the test samples that are classified as class A (to make sure the model is using the shortcut) 6- Compute average attributions of poisoned and clean train samples
Steps 3 and 5, I am not sure of. I will first implement without those features, then they are easy to add (just filter test samples during
evaluate
call)