Closed Fanke-max closed 2 years ago
During the attack process, we only use a sign of the gradient of the loss. Thus, the reduction does not affect the attack performance.
Thanks for your explanation! I continue to use the intuition of L2 loss and forget that we just use signs。
Although most of the time there won't be any problem, it seems the results rely on the batch size since PyTorch uses "mean" as the default reduction of nn.CrossEntropyLoss()? Should I set nn.CrossEntropyLoss() the reduction mode to "sum"?