Closed spandanagella closed 5 years ago
pytorch version that I'm using is 1.0.1.
Hi @spandanagella, Thanks for pointing that out !
Indeed, there was a problem with the embedding layer (size of the dictionary of embeddings to be more precise). Fixed it with this commit: https://github.com/ArdalanM/nlp-benchmarks/commit/393064d8bdf50976706f6d80e1e9163814ecf5d4
master
should work now (tested on yelp_polarity
dataset)
Let me know if it does not.
Cheers, Ardalan
Thanks Ardalan for quick response. This fixed the issue :)
Hi,
I trained VDNN models on AG news and few other datasets that I have and it worked as expected. However, when running this on binary classification datasets (including yelp polarity) model fails with below error. I tested this with multiple binary classification datasets of different sizes.
THCudaCheck FAIL file=/pytorch/aten/src/THCUNN/generic/SpatialDilatedMaxPooling.cu line=120 error=59 : device-side assert triggered
Any idea why this is happening? Would really appreciate if anyone can give some pointers on why this is happening!
Thanks, Spandana
Complete error:
File "/code/vdcnn/vdcnn_working_nlp_benchmarks/src/vdcnn/main.py", line 356, in
/pytorch/aten/src/THC/THCTensorIndex.cu:362: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [78,0,0], thread: [63,0,0] Assertion
srcIndex < srcSelectDimSize
failed. train_acc = train(epoch,net, tr_loader, device, msg="training", optimize=True, optimizer=optimizer, scheduler=scheduler, criterion=criterion) File "code/vdcnn/vdcnn_working_nlp_benchmarks/src/vdcnn/main.py", line 192, in train out = net(data[0]) File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(*input, kwargs) File "code/vdcnn/vdcnn_working_nlp_benchmarks/src/vdcnn/net.py", line 113, in forward out = self.layers(out) File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(*input, *kwargs) File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward input = module(input) File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(input, kwargs) File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/pooling.py", line 77, in forward self.return_indices) File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/torch/_jit_internal.py", line 132, in fn return if_false(*args, **kwargs) File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py", line 394, in _max_pool1d input, kernel_size, stride, padding, dilation, ceil_mode)[0] File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py", line 386, in max_pool1d_with_indices input, kernel_size, _stride, padding, dilation, ceil_mode) RuntimeError: cuda runtime error (59) : device-side assert triggered at /pytorch/aten/src/THCUNN/generic/SpatialDilatedMaxPooling.cu:120