Open KatsunariShishido opened 2 years ago
Hi @KatsunariShishido Thank you very much for using ART and reporting this issue! Would you have a specific script that reproduces the behaviors that we could use for testing?
Hi @beat-buesser A test case that occurs these bugs is shown in below. Call self._reward_all and diverges to infinity when the logit of victim classifier is large value.
import art
from art.attacks.extraction import KnockoffNets
import numpy as np
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import numpy as np
from art.attacks.evasion import FastGradientMethod
from art.estimators.classification import PyTorchClassifier
from art.utils import load_mnist
# Step 0: Define the neural network model, return logits instead of activation in forward method
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv_1 = nn.Conv2d(in_channels=1, out_channels=4, kernel_size=5, stride=1)
self.conv_2 = nn.Conv2d(in_channels=4, out_channels=10, kernel_size=5, stride=1)
self.fc_1 = nn.Linear(in_features=4 * 4 * 10, out_features=100)
self.fc_2 = nn.Linear(in_features=100, out_features=2)
def forward(self, x):
x = F.relu(self.conv_1(x))
x = F.max_pool2d(x, 2, 2)
x = F.relu(self.conv_2(x))
x = F.max_pool2d(x, 2, 2)
x = x.view(-1, 4 * 4 * 10)
x = F.relu(self.fc_1(x))
x = self.fc_2(x)
return x
# Step 1: Load the MNIST dataset
(x_train, y_train), (x_test, y_test), min_pixel_value, max_pixel_value = load_mnist()
# Step 1a: Swap axes to PyTorch's NCHW format
x_train = np.transpose(x_train, (0, 3, 1, 2)).astype(np.float32)
x_test = np.transpose(x_test, (0, 3, 1, 2)).astype(np.float32)
# Step 2: Create the model
model = Net()
# Step 2a: Define the loss function and the optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)
# Step 3: Create the ART classifier
classifier = PyTorchClassifier(
model=model,
clip_values=(min_pixel_value, max_pixel_value),
loss=criterion,
optimizer=optimizer,
input_shape=(1, 28, 28),
nb_classes=2,
)
logit = np.array([[ 3739.1516, -1920.3384]])
logit_hat = np.array([[-3.1932125, 2.8115182]])
test_knockoff = KnockoffNets(classifier)
test_knockoff.reward_avg = np.array([1.14033569 ,0.59332209 ,4.3476869 ])
test_knockoff.reward_var = np.array([ 0.7843785 ,0.20409893 ,19.06720789])
test_knockoff.y_avg = 0
print(test_knockoff._reward_all(logit,logit_hat,n=2))
Hi @KatsunariShishido Thank you very much. I have been able to reproduce it. We'll aim to fix this with one of the upcoming releases.
Describe the bug The program is aborted when calculating reward diverges to infinity, depends on the output of victim classifier and thieved classifier.
To Reproduce There exists three patterns.
y_output[0]
is large value in knockoff_nets.py:L362.np.log(probs_hat[k])
diverges to negative infinity whenprobs_hat[k]
is zero in knockoff_nets.py:L372.(reward - self.reward_avg) / np.sqrt(self.reward_var)
diverges to infinity whennp.sqrt(self.reward_var)
is zero in knockoff_nets.py:L394.Expected behavior Attack runs.
System information (please complete the following information):