ArdalanM / nlp-benchmarks

129 stars 24 forks source link

VDCNN is failing with binary classification #11

Closed spandanagella closed 5 years ago

spandanagella commented 5 years ago

Hi,

I trained VDNN models on AG news and few other datasets that I have and it worked as expected. However, when running this on binary classification datasets (including yelp polarity) model fails with below error. I tested this with multiple binary classification datasets of different sizes.

THCudaCheck FAIL file=/pytorch/aten/src/THCUNN/generic/SpatialDilatedMaxPooling.cu line=120 error=59 : device-side assert triggered

Any idea why this is happening? Would really appreciate if anyone can give some pointers on why this is happening!

Thanks, Spandana

Complete error:

File "/code/vdcnn/vdcnn_working_nlp_benchmarks/src/vdcnn/main.py", line 356, in /pytorch/aten/src/THC/THCTensorIndex.cu:362: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [78,0,0], thread: [63,0,0] Assertion srcIndex < srcSelectDimSize failed. train_acc = train(epoch,net, tr_loader, device, msg="training", optimize=True, optimizer=optimizer, scheduler=scheduler, criterion=criterion) File "code/vdcnn/vdcnn_working_nlp_benchmarks/src/vdcnn/main.py", line 192, in train out = net(data[0]) File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(*input, kwargs) File "code/vdcnn/vdcnn_working_nlp_benchmarks/src/vdcnn/net.py", line 113, in forward out = self.layers(out) File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(*input, *kwargs) File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward input = module(input) File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(input, kwargs) File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/pooling.py", line 77, in forward self.return_indices) File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/torch/_jit_internal.py", line 132, in fn return if_false(*args, **kwargs) File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py", line 394, in _max_pool1d input, kernel_size, stride, padding, dilation, ceil_mode)[0] File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py", line 386, in max_pool1d_with_indices input, kernel_size, _stride, padding, dilation, ceil_mode) RuntimeError: cuda runtime error (59) : device-side assert triggered at /pytorch/aten/src/THCUNN/generic/SpatialDilatedMaxPooling.cu:120

spandanagella commented 5 years ago

pytorch version that I'm using is 1.0.1.

ArdalanM commented 5 years ago

Hi @spandanagella, Thanks for pointing that out !

Indeed, there was a problem with the embedding layer (size of the dictionary of embeddings to be more precise). Fixed it with this commit: https://github.com/ArdalanM/nlp-benchmarks/commit/393064d8bdf50976706f6d80e1e9163814ecf5d4

master should work now (tested on yelp_polarity dataset) Let me know if it does not.

Cheers, Ardalan

spandanagella commented 5 years ago

Thanks Ardalan for quick response. This fixed the issue :)