Some problem about the defence detction

RandolphCarter0 commented 1 month ago

Hi, LOTUS is an interesting work, thank you for sharing! I tried to reproduce the LOTUS performance of the defense method neural cleanse (settings referenced to their code) according to the configuration in the LOTUS paper. But I found that the norm values on resnet18 are NAN for all the class labels. Did I do something wrong during the operation (Maybe I need to change the configuration of resnet18_lotus? or settings of neural cleanse? )? May I ask you to release more details related to the backdoor defence experiment? 20240726181456

Megum1 commented 1 month ago

Thank you for your interest in our work! The issue with the norm values being NaN may be due to the configuration/environment inconsistencies. To help you better, I’ve attached our Neural Cleanse code in the misc/nc.py file. Hope that could be helpful.

RandolphCarter0 commented 1 month ago

Thank you for your sharing! I tried to run the code in the file misc/nc.py, with the set of resnet18 in the paper, and it's shown that the method lotus had a good performance in the detection algorithms like neural cleanse! But I found an interesting phenomenon in the experiments. I used the nc method to make detection for a clean model (resnet18 in cifar10) trained in step 1 in file main.py and got the result of the anomaly index which is 2.72 for repeated experiments. Is it related to the early stop mechanism and MAD outlier detection algorithm in nc or other configuration inconsistencies? I'm looking forward to hearing your opinion.

Megum1 commented 4 weeks ago

Hi. Thanks for you interest and this is a good observation!!

Originally, NC is designed to detect the universal backdoor (victim can be any sample) and it works well. So if you run python nc.py --model_filepath checkpoint/clean.pt --victim -1. It should output an anomaly score as ~0.8.

In our experiment, to enhance NC's ability to detect label-specific backdoor (victim samples should be from the victim class), we only feed validation samples from the victim class and calculate the anomaly score. This could introduce false positive cases, where a clean model is detected as backdoor.

We suspect there are several potential reasons for the false positive case. (1) The clean model itself may have natural vulnerability after training. For example, it is very easy to flip samples from Class dog to Class cat using a small adversarial perturbation. This is possible as the typical training framework does not consider robust adversarial training. (2) The samples we select for NC are not confident. For example, some dog samples have 0.5 confidence on Class dog and ~0.3 confidence on Class cat and hence it's easy to flip the prediction. (3) As you have mentioned, NC's early stop mechanism may stop the optimization before it is optimal or fully converged. For example, the patience=5 value is a bit small and NC will stop when the mask size is not reduced for 5 steps, but it may be reduced at the 6th step.

I think a possible and easiest solution to mitigate the false positive cases is to try different random seeds --seed and probably set a high patience such that NC could have more steps to refine its output.

Hope that could help and thanks again for your comments.

RandolphCarter0 commented 2 weeks ago

Thank you for your profound insights！ Now the Lotus attack works well on the detection method! Inspired by the impressive and interesting attack Lotus, we'll further explore the backdoor attack and defense.

Meanwhile, I noted that Lotus performed excellently on the backdoor mitigation method. May I ask you to release more detail of the experiment set for the backdoor mitigation method (in the paper Table 3) such as the source code for FP, ANP, and NAD? I would be extremely grateful for your reply.

Megum1 commented 1 week ago

Hi, thanks again for your continued interest! We are planning to release our mitigation method in the near future. In the meantime, you can check out another repository of ours that contains implementations of recent mitigation techniques: OrthogLinearBackdoor

Megum1 / LOTUS

Some problem about the defence detction #1