Printing attack adversaries in a verbose mode, for better debugging purpose

stefan-matcovici commented 5 years ago

Full stacktrace:

/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = long, Dims = 2]: block: [0,0,0], thread: [64,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.
/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = long, Dims = 2]: block: [0,0,0], thread: [65,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.
/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = long, Dims = 2]: block: [0,0,0], thread: [66,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.
/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = long, Dims = 2]: block: [0,0,0], thread: [67,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.
/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = long, Dims = 2]: block: [0,0,0], thread: [68,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.
............................
/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = long, Dims = 2]: block: [0,0,0], thread: [30,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.
/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = long, Dims = 2]: block: [0,0,0], thread: [31,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.
Traceback (most recent call last):
  File "adversarial_test_model.py", line 56, in <module>
    advdata = adversary.perturb(clndata, target)
  File "/nfshomes/smatcovi/works/pilot-project/ppenv/lib/python3.7/site-packages/advertorch/attacks/carlini_wagner.py", line 209, in perturb
    final_l2distsqs = torch.FloatTensor(final_l2distsqs).to(x.device)
RuntimeError: CUDA error: device-side assert triggered

I am trying to use the CarliniWagnerL2Attack. I am using the code tutorial_train_mnist.py but for checking robust accuracy. When I run locally (GTX1050) everything works. When I try to run on a nvidia K20 or M40 I get this error.

stefan-matcovici commented 5 years ago

After running with CUDA_LAUNCH_BLOCKING=1, I might get a more specific stacktrace:

Traceback (most recent call last):
  File "adversarial_test_model.py", line 56, in <module>
    advdata = adversary.perturb(clndata, target)
  File "/nfshomes/smatcovi/works/pilot-project/ppenv/lib/python3.7/site-packages/advertorch/attacks/carlini_wagner.py", line 207, in perturb
    y_onehot = to_one_hot(y, self.num_classes).float()
  File "/nfshomes/smatcovi/works/pilot-project/ppenv/lib/python3.7/site-packages/advertorch/utils.py", line 83, in to_one_hot
    y_one_hot = y.new_zeros((y.size()[0], num_classes)).scatter_(1, y, 1)

gwding commented 5 years ago

I did some search, and found something like the following https://github.com/fastai/fastai/issues/440 https://forums.fast.ai/t/pytorch-not-working-with-an-old-nvidia-card/14632/17 https://discuss.pytorch.org/t/pytorch-no-longer-supports-this-gpu-because-it-is-too-old/13803 https://en.wikipedia.org/wiki/Nvidia_Tesla

It seems that pre-complied pytorch does not work old nvidia gpus, and compile from scratch might make it work. But I don't have an old gpu in hand so cannot do this test myself.

stefan-matcovici commented 5 years ago

I might have sent you on the wrong track. Sorry about that. I was calling the attack with True as an argument for the num_of_classes. Maybe this is an issue after all because this wasn't caught until the GPU computations. Also, as a future feature, I think it would be a great idea to add some logging about the attack like how many iterations, the current loss, etc.

That being said, you may close the issue and sorry one more time

gwding commented 5 years ago

@stefan-matcovici Thanks for the suggestion. I changed the title of this issue, and labeled it as a todo item.

BorealisAI / advertorch

Printing attack adversaries in a verbose mode, for better debugging purpose #32