Shape mismatch in adv_loss?

Hi there,

Firstly, thank you for the work and the code - it is very interesting to go through and to look at.

I was looking at the calculation of the adversarial loss under the condition "cw" (which I assume stands for Carlini and Wagner and is margin loss). I am guessing this refers to the following loss function from your paper:

The line of code looks like this: adv_loss = (pred[:, label] - pred.gather(1, indices) + args.kappa).clamp(min=0).mean().

I ran the code with a batch size of 10 for the Gumbel-Softmax samples. Here are the shapes of the terms:

the first term pred[:, label] has shape torch.Size([10]).
the second term pred.gather(1, indices) has shape torch.Size([10, 1]).
when you minus the second term from the first, the resulting shape is torch.Size([10, 10]).

This doesn't seem right (I'd expect a vector of length 10, not a matrix of 10x10), but perhaps I am misunderstanding something.

facebookresearch / text-adversarial-attack

Shape mismatch in adv_loss? #6