ain-soph / trojanzoo

TrojanZoo provides a universal pytorch platform to conduct security researches (especially backdoor attacks/defenses) of image classification in deep learning.
https://ain-soph.github.io/trojanzoo
GNU General Public License v3.0
274 stars 62 forks source link

Possible bug: target_class not changed when computing ASR for reversed triggers #184

Closed CassiniHuy closed 1 year ago

CassiniHuy commented 1 year ago

I found that in the implementation of ModelInspection class, the target_class is not changed when computing ASR for each reversed trigger.

The code computing the ASR: (from https://github.com/ain-soph/trojanzoo/blob/main/trojanvision/defenses/abstract.py#L366)

...
            mark, loss = self.optimize_mark(label, verbose=verbose, **kwargs)
            if verbose:
                asr, _ = self.attack.validate_fn(indent=4)
                if not self.mark_random_pos:
                    select_num = self.attack.mark.mark_height * self.attack.mark.mark_width
                    overlap = mask_jaccard(self.attack.mark.get_mask(),
                                           self.real_mask,
                                           select_num=select_num)
                    prints(f'Jaccard index: {overlap:.3f}', indent=4)
            else:
                asr, _ = self.model._validate(get_data_fn=self.attack.get_data,
                                              keep_org=False, poison_label=True,
                                              verbose=False)
...

the get_data method only returns the data of the default target class (label 0). This infects defenses like Neural Cleanse, ABS. The outputs look like the following for a benign model:

asr           : [  99.160,    0.320,    0.500,    1.110,    0.540,    0.360,    0.180,    1.020,    0.410,    0.670]
...

asr MAD       : [ 391.443,    0.714,    0.000,    2.420,    0.159,    0.555,    1.270,    2.063,    0.357,    0.674]
...

Only the ASR of the default target class is meaningful, while the ASRs of other classes are almost zero.

ain-soph commented 1 year ago

It seems to be the case and modifying target_class in the loop shall fix the problem.
I'll later check details. The experiment results in paper are not affected because they are based on the old codes.

ain-soph commented 1 year ago

Fixed by https://github.com/ain-soph/trojanzoo/commit/5483a3fa8fd7ddd6ba835b222d4cb0cf6d69ac1c