Reference paper cited. - Githubissues

philipperemy commented 6 years ago

Can you run and get the most influential training examples of a very large dataset?

[1] Pang Wei Koh and Percy Liang "Understanding Black-box Predictions via Influence Functions" ICML2017

I tried the code they provided but it never worked for big datasets, other than easy model and MNIST

Thanks

teradepth commented 6 years ago

Yes. The reference only works for small network and dataset. For the deeper network like InceptionNet, they use weights of the output layer, not the entire layers. We modified those to apply any existing deep networks, and you can also get most influential examples of a large dataset. But, currently, the speed is very slow to run entire large dataset because it calculates sample by sample, not in parallel. If it is too slow to run entire set, you can sample randomly to see influential examples within the sampled set. Please refer to this example. https://github.com/darkonhub/darkon-examples/blob/master/cifar10-resnet/influence_cifar10_resnet.ipynb

We are making it faster now. please wait for future release.

Thanks.

philipperemy commented 6 years ago

Very interesting and looking forward to seeing the next release.

I will definitely have a look at this example.

philipperemy commented 6 years ago

(Thank you very much for your prompt answer by the way)

darkonhub / darkon

Reference paper cited. #23