Trusted-AI / adversarial-robustness-toolbox

Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams
https://adversarial-robustness-toolbox.readthedocs.io/en/latest/
MIT License
4.84k stars 1.16k forks source link

Bugs in knockoff_nets depending on the output of victim classifier and thieved classifier #1746

Open KatsunariShishido opened 2 years ago

KatsunariShishido commented 2 years ago

Describe the bug The program is aborted when calculating reward diverges to infinity, depends on the output of victim classifier and thieved classifier.

To Reproduce There exists three patterns.

  1. The victim classifier confidence diverges to infinity when the logit of victim classifier y_output[0] is large value in knockoff_nets.py:L362.
  2. np.log(probs_hat[k]) diverges to negative infinity when probs_hat[k] is zero in knockoff_nets.py:L372.
  3. (reward - self.reward_avg) / np.sqrt(self.reward_var) diverges to infinity when np.sqrt(self.reward_var) is zero in knockoff_nets.py:L394.

Expected behavior Attack runs.

System information (please complete the following information):

beat-buesser commented 2 years ago

Hi @KatsunariShishido Thank you very much for using ART and reporting this issue! Would you have a specific script that reproduces the behaviors that we could use for testing?

KatsunariShishido commented 2 years ago

Hi @beat-buesser A test case that occurs these bugs is shown in below. Call self._reward_all and diverges to infinity when the logit of victim classifier is large value.

import art
from art.attacks.extraction import KnockoffNets
import numpy as np

import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import numpy as np

from art.attacks.evasion import FastGradientMethod
from art.estimators.classification import PyTorchClassifier
from art.utils import load_mnist

# Step 0: Define the neural network model, return logits instead of activation in forward method

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv_1 = nn.Conv2d(in_channels=1, out_channels=4, kernel_size=5, stride=1)
        self.conv_2 = nn.Conv2d(in_channels=4, out_channels=10, kernel_size=5, stride=1)
        self.fc_1 = nn.Linear(in_features=4 * 4 * 10, out_features=100)
        self.fc_2 = nn.Linear(in_features=100, out_features=2)

    def forward(self, x):
        x = F.relu(self.conv_1(x))
        x = F.max_pool2d(x, 2, 2)
        x = F.relu(self.conv_2(x))
        x = F.max_pool2d(x, 2, 2)
        x = x.view(-1, 4 * 4 * 10)
        x = F.relu(self.fc_1(x))
        x = self.fc_2(x)
        return x

# Step 1: Load the MNIST dataset

(x_train, y_train), (x_test, y_test), min_pixel_value, max_pixel_value = load_mnist()

# Step 1a: Swap axes to PyTorch's NCHW format

x_train = np.transpose(x_train, (0, 3, 1, 2)).astype(np.float32)
x_test = np.transpose(x_test, (0, 3, 1, 2)).astype(np.float32)

# Step 2: Create the model

model = Net()

# Step 2a: Define the loss function and the optimizer

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)

# Step 3: Create the ART classifier

classifier = PyTorchClassifier(
    model=model,
    clip_values=(min_pixel_value, max_pixel_value),
    loss=criterion,
    optimizer=optimizer,
    input_shape=(1, 28, 28),
    nb_classes=2,
)

logit =  np.array([[ 3739.1516, -1920.3384]])
logit_hat =  np.array([[-3.1932125, 2.8115182]])

test_knockoff = KnockoffNets(classifier)
test_knockoff.reward_avg = np.array([1.14033569 ,0.59332209 ,4.3476869 ])
test_knockoff.reward_var = np.array([ 0.7843785   ,0.20409893 ,19.06720789])
test_knockoff.y_avg = 0
print(test_knockoff._reward_all(logit,logit_hat,n=2))
beat-buesser commented 2 years ago

Hi @KatsunariShishido Thank you very much. I have been able to reproduce it. We'll aim to fix this with one of the upcoming releases.