PyTorchModel.gradients() returns different values depending on the batch size

samuelemarro commented 4 years ago

OS: Windows 10 Python Version: 3.7.1 Foolbox Version: 2.3.0 Torch Version: 1.4.0 (CUDA 10.1)

When I call .gradients() with a batch of size b, the returned gradient is always grad / b.

Code to reproduce:

import torchvision
import numpy as np
import foolbox

batch_size = 1

# Use a pretrained model
torch_model = torchvision.models.resnet50(pretrained=True)
torch_model.eval()

# Prepare the image and the label
image, label = foolbox.utils.imagenet_example()
image = np.moveaxis(image, 2, 0)
image = image / 255
label = np.array(label)

# Create a fake batch
images = np.repeat(image[np.newaxis], batch_size, axis=0)
labels = np.repeat(label[np.newaxis], batch_size, axis=0)

# Create a PyTorchModel
mean = np.array([0.485, 0.456, 0.406]).reshape((3, 1, 1))
stdevs = np.array([0.229, 0.224, 0.225]).reshape((3, 1, 1))
model = foolbox.models.PyTorchModel(torch_model, bounds=(0, 1), num_classes=1000,
preprocessing=(mean, stdevs))

# Compute the gradients
grads = model.gradient(images, labels)

print(grads[0].mean())

By setting batch_size = 10, the new gradients are exactly 1/10 of the original ones. The problem also appears with .forward_and_gradient().

A possible cause is the reduction used by nn.CrossEntropyLoss(): by default, CrossEntropyLoss returns the mean of the loss across the whole batch, so the loss is divided by the batch size. Using nn.CrossEntropyLoss(reduction='sum') fixes the problem, but I don't know if it's a legitimate solution or just a workaround.