Trusted-AI / adversarial-robustness-toolbox

Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams
https://adversarial-robustness-toolbox.readthedocs.io/en/latest/
MIT License
4.88k stars 1.17k forks source link

NaNs in Wasserstein Attack #2305

Open billbradley opened 1 year ago

billbradley commented 1 year ago

Describe the bug The Wasserstein attack produces NaNs in the output.

To Reproduce I believe I've produced a minimal example of the issue. You can run it on Google's Colab here: https://drive.google.com/file/d/1GoikJzRJAdJjnAb1j2SB8Tu453ZxIAsi/view?usp=sharing

The code includes both a Fast Gradient Method attack and the Wasserstein attack; the FGM attack runs fine and hopefully establishes that there aren't any errors in the input processing.

Note that running the code produces the warnings:

/usr/local/lib/python3.10/dist-packages/art/attacks/evasion/wasserstein.py:407: RuntimeWarning: invalid value encountered in log
  alpha[i_nonzero_] = (np.log(self._local_transport(var_k, exp_beta, self.kernel_size)) - np.log(x))[
/usr/local/lib/python3.10/dist-packages/art/attacks/evasion/wasserstein.py:485: RuntimeWarning: invalid value encountered in log
  alpha = np.log(self._local_transport(var_k, exp_beta, self.kernel_size)) - np.log(x_init)

In the current ART code, we have:

Line 406:            x[x == 0.0] = EPS_LOG  # Prevent divide by zero in np.log
Line 484:            x_init[x_init == 0.0] = EPS_LOG  # Prevent divide by zero in np.log

If we replace that with:

Line 406:            x[x <= 0.0] = EPS_LOG  # Prevent divide by zero in np.log
Line 484:            x_init[x_init <= 0.0] = EPS_LOG  # Prevent divide by zero in np.log

then the warnings disappear and the output is finite (i.e., no NaNs). However, I don't know what I'm doing, in terms of algorithmics or numerical analysis, so I wasn't comfortable making that switch.

For completeness, I'm also including a Python script version of the Jupyter notebook:

import numpy as np
import torch
from torch import nn
import torchvision

# Download an example image from the pytorch website
import urllib

url, filename = ("https://github.com/pytorch/hub/raw/master/images/dog.jpg", "dog.jpg")
try:
    urllib.URLopener().retrieve(url, filename)
except:
    urllib.request.urlretrieve(url, filename)

# Preprocess image appropriately
from PIL import Image
from torchvision import transforms

input_image = Image.open(filename)
preprocess = transforms.Compose(
    [
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    ]
)
input_tensor = preprocess(input_image)
input_batch = input_tensor.unsqueeze(0)  # create a mini-batch as expected by the model

print(f"Range of values: {torch.min(input_batch):.3f} - {torch.max(input_batch):.3f}")

# Grab pretrained Resnet34
model = torch.hub.load(
    "pytorch/vision:v0.10.0",
    "resnet34",
    weights=torchvision.models.ResNet34_Weights.DEFAULT,
)

shrink_num_target_classes = False
if shrink_num_target_classes:
    # Swap last layer from a 1000x classifier to a 3x classifier; note, new terminal layer
    # is untrained / random
    num_classes = 3
    model.fc = nn.Linear(in_features=512, out_features=num_classes, bias=True)
else:
    # Or, use a vanilla Resnet
    num_classes = 1000

# Create a ART classifier for PyTorch
from art.estimators.classification import PyTorchClassifier
loss = torch.nn.CrossEntropyLoss()
input_shape = (1, 320, 320, 3)
classifier = PyTorchClassifier(
    model=model,
    loss=loss,
    input_shape=input_shape,
    nb_classes=num_classes,
)

# Instantiate ART attacks
from art.attacks.evasion import Wasserstein, FastGradientMethod
attacks = {}
attacks["FGM"] = FastGradientMethod(estimator=classifier)
if False:
    # Default Wasserstein attack (which is very, very slow)
    attacks["Wasserstein"] = Wasserstein(estimator=classifier)
else:
    # Shorten loops to speed up debugging
    attacks["Wasserstein"] = Wasserstein(
        estimator=classifier,
        max_iter=4,
        conjugate_sinkhorn_max_iter=4,
        projected_sinkhorn_max_iter=4,
    )

# Run ART attacks
features_packed = input_batch.numpy()
for attack in attacks:
    print(attack)
    adversarial_input = attacks[attack].generate(features_packed)  # , max_iter=100)
    if np.any(np.isnan(adversarial_input)):
        print(f"\n   NaN is present in {attack}!\n")
    else:
        print(f"\n   Healthy output from {attack}.\n")

Expected behavior Given a non-pathological input image, I would expect the Wasserstein attack to produce non-NaN output.

Screenshots (None.)

System information I replicated the problem on Google's Colab, which is presumably running Linux, but here are my own system details:

billbradley commented 1 year ago

PS: I meant to mention that there seem to be some hints of this problem in the past, to wit: https://github.com/Trusted-AI/adversarial-robustness-toolbox/discussions/1339

billbradley commented 1 year ago

PPS: This issue should certainly have the "bug" label, but I didn't see how to add that; if anyone could add it for me, I'd be grateful.

billbradley commented 1 year ago

Are these issues actively monitored? I'd be happy to improve my bug report to make it more helpful, but I'm not sure what to change.

beat-buesser commented 1 year ago

Hi @billbradley Yes, the issues are actively monitored. We did not yet have time to take a closer look at it. Did you see any cause for the negative values in x?

billbradley commented 1 year ago

No, I didn't understand where the negative values came from. Honestly, I found it pretty surprising.