bolunwang / backdoor

Code implementation of the paper "Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks", at IEEE Security and Privacy 2019.
https://sandlab.cs.uchicago.edu/
MIT License
267 stars 62 forks source link

Information about VGGFace models is missing: can you add it? #6

Closed mvillarreal14 closed 4 years ago

mvillarreal14 commented 4 years ago

Hello,

I have two requests:

1) Can you provide the code you used to reverse-engineer the VGG-Face models? It would be great if you add this code to this repo.

2) Can you provide the information you used to apply the pruning method over the two models: GTSRB-based model and VGG-Face model? That is, which neurons did you remove and how did you select them? The results I am getting are lower than the ones reported in the paper.

Shawn-Shan commented 4 years ago

Thanks for your interests in your paper.

Request 1: We are slowly release our code for different datasets. In the meantime, you can easily obtain the VGGFace model. There are many pretrained VGGFace model exist online, and we use PubFig as the transfer learning dataset. Request 2: I might misunderstand your question, but we prune the neurons that are most related to reverse engineered triggers. We measure relatedness by the value of neuron activation. Also make sure to not optimize the pruned neurons during the fine tuning, which might be the reason that leads to poor performance.

Let me know if I answer your questions.

mvillarreal14 commented 4 years ago

Thank you Shawn for your prompt answer.

Unfortunately, the provided answers are not of so much help. Let me rephrase my questions so that you can help me better with my inquires:

Request 1: Authors in [1] (see below) infected a VGG-FACE model via Trojanning Attack with two different triggers (square and watermark) producing two trojaned models. The authors have the resulting trojaned models available in their repository [2] (see below). In your paper, you claimed you took these two trojaned models and reverse-engineered them following your approach. However, you did not include the source code to reverse-engineer these models in your repository. Can you share this source code? The code provided at the moment in your repo does not work with these two VGG-FACE models.

Request 2: After reverse-engineering any model, you applied two complementary methods: unlearning and pruning. In unlearning, you fine-tuned the model using samples (with the correct labels) that include the reverse-engineered trigger. In pruning, you removed (zerod) the neurons related to the reverse-engineered triggers. What I am asking for is the list of neurons (per layer) you pruned from the models (GTSRB-based model and the two VGG-FACE models ) to get the published results. We are interested in that because picking different neurons leads to different results.

To sum up, I am trying to exactly replicate the experiments presented in your paper. So far, my results are much lower than yours.

Thank you in advance for your help.

[1] Yingqi Liu, Shiqing Ma, Yousra Aafer, Wen-Chuan Lee, Juan Zhai, Weihang Wang, and Xiangyu Zhang. 2018. Trojaning Attack on Neural Networks. In 25nd Annual Network and Distributed System Security Symposium, NDSS 2018, San Diego, California, USA, February 18-221, 2018.

[2] https://github.com/PurduePAML/TrojanNN

anisaha1 commented 4 years ago

@Shawn-Shan I have the same request. Could you please release a small example pruning script which prunes one of your models.

  1. You mention "second to last layer" but I am confused about what activations to consider? e.g. for GTSRB model (512 layer) should I consider pre-ReLU/after-ReLU activation values?
  2. For ranking the neurons, how do you average the activations across images? Do you take the mean directly? Or the mean of the absolute value? Also, while taking the difference of activations between clean and backdoored data do you take the absolute difference?

It would be great if you could provide some help.

bolunwang commented 4 years ago

Sorry for the late response

Our layer definition includes activation, so it's always after ReLu.

For ranking neurons, we took the mean of absolute difference.