chihkuanyeh / Representer_Point_Selection

code release for Representer point Selection for Explaining Deep Neural Network in NeurIPS 2018
MIT License
65 stars 16 forks source link

Clarifications and Possible ResNet-50 Walkthrough #1

Closed adebayoj closed 5 years ago

adebayoj commented 5 years ago

Hi, I read your paper and blogpost. Really interesting work! I am hoping to apply your method to a resnet-50 trained on a different dataset, but I am having a hard time doing this. How would you suggest I proceed? I am currently looking at this notebook: https://github.com/chihkuanyeh/Representer_Point_Selection/blob/master/experiments/fig3_4_visualize_samples_awa.ipynb, here are some clarification questions that I have:

1) What does a negative example mean? I am referring to figure 3 in your paper. Does this mean these are examples that the test point is least like in the training set?

2) I am hoping to essentially replicate Figure. 3 for my specific case and 6 for my specific case. The sensitivity map decomposition that you show, are those the sensitivity maps for the training points? More specifically, in figure 6 (row 2) in your paper: column 1 is d(logit_test_point)/d(test_point), are the remaining columns, d(logit_train_point)/d(input)?

3) In your comparison to the influence function method. Are the rankings you produce also similar to the rankings produced by the influence functions method?

These are my questions. Thank you very much for the great work!

chihkuanyeh commented 5 years ago

Hi, thanks for your interest in our work. Regarding the questions,

  1. The negative examples mean the training examples that suppress the activation values for a particular class, which is usually training images that look similar to the testing image but belongs to a different class.

  2. No, they are actually the sensitivity maps for the test points. Since we have the decomposition logit_test_point = Sum_i (representer_value_i regarding test point), d(logit_test_point)/d(test_point) = Sum_i [d(representer_value_i regarding test point)/d(test_point)]. Therefore, in column 1 we show d(logit_test_point)/d(test_point), and Column 2 we show d(representer_value_i regarding test point)/d(test_point), and theoretically if we sum column_2, column_3, ... column_n, it's sums up to column_1.

  3. They are somewhat similar. I do not know how to quantify this, but in figure 3 we see that our top positive and negative examples share some similarity.

  4. To apply our method to a Resnet-50 trained on a different dataset, you can simply replace the load_data() function in compute_representer_vals.py by extracting the features and weights of the pre-trained network. To apply our method on a large scale dataset, you can replace the backtracking_line_search function in compute_representer_vals.py to an LBFGS solver in SCIPY, which I have found to be the key to converge to a stationary point in a large scale dataset.

Let me know if there are any further questions, I haven't cleaned and released the code using LBFGS since we do not show such experiments in the paper. However, I will try to release that part of the code for applying the representer framework for largescale datasets (such as imagenet scaled datasets).

adebayoj commented 5 years ago

Thank you for the quick response and clarifications. I'll follow your suggestions. If you have some time to include the code on large scale dataset, it will be great, though there is no rush. I'll close this issue since I don't have other comments.