Problem Statement:
When I am running the following command
_th vqatrain.lua -gpuid 1
I get the following message :
loading cache..: /home1/badri/badripatro/VQA/workspace_project/image_qa_dpp/DPPnet-master/004_train_DPPnet_fixed_cnn/cache/vqa_data_cache_major_test-dev2015_54
done
creating a neural network with random initialization
/home/cse/torch/install/bin/luajit: C++ exception
badri@cse-desktop:/DPPnet-master/004_train_DPPnet_fixed_cnn$
Also, I have narrowed it down to the line 79 of file "DPPnet-master_1/model/HashedNets/HasherME.lua" and get get "libhashnn.mysort()" has problem
Then I have commented the line -79, and complied again
_th vqatrain.lua -gpuid 1
I get the following message :
loading cache..: /home1/badri/badripatro/VQA/workspace_project/image_qa_dpp/DPPnet-master/004_train_DPPnet_fixed_cnn/cache/vqa_data_cache_major_test-dev2015_54
done
creating a neural network with random initialization
initialing weights..
[train2014val2014] set batch order option 1 : shuffle __
THCudaCheck FAIL file=/home1/badri/torch/extra/cutorch/lib/THC/generic/THCStorage.c line=147 error=77 : an illegal memory access was encountered
/home1/badri/torch/install/bin/luajit: cuda runtime error (77) : an illegal memory access was encountered at /home1/badri/torch/extra/cutorch/lib/THC/generic/THCStorage.c:147
I have narrowed this problem down to the line 423 of file
Problem Statement: When I am running the following command
_th vqatrain.lua -gpuid 1
I get the following message :
loading cache..: /home1/badri/badripatro/VQA/workspace_project/image_qa_dpp/DPPnet-master/004_train_DPPnet_fixed_cnn/cache/vqa_data_cache_major_test-dev2015_54 done creating a neural network with random initialization
/home/cse/torch/install/bin/luajit: C++ exception badri@cse-desktop:/DPPnet-master/004_train_DPPnet_fixed_cnn$
Also, I have narrowed it down to the line 79 of file "DPPnet-master_1/model/HashedNets/HasherME.lua" and get get "libhashnn.mysort()" has problem
_libhashnn.mysort(self['sortkey' .. WorB],self['sortval'.. WorB])_
Then I have commented the line -79, and complied again
_th vqatrain.lua -gpuid 1
I get the following message :
loading cache..: /home1/badri/badripatro/VQA/workspace_project/image_qa_dpp/DPPnet-master/004_train_DPPnet_fixed_cnn/cache/vqa_data_cache_major_test-dev2015_54 done
creating a neural network with random initialization
initialing weights..
[train2014val2014] set batch order option 1 : shuffle __
THCudaCheck FAIL file=/home1/badri/torch/extra/cutorch/lib/THC/generic/THCStorage.c line=147 error=77 : an illegal memory access was encountered /home1/badri/torch/install/bin/luajit: cuda runtime error (77) : an illegal memory access was encountered at /home1/badri/torch/extra/cutorch/lib/THC/generic/THCStorage.c:147
I have narrowed this problem down to the line 423 of file
Still on more debug, find in line no 114 of file "DPPnet-master_1/model/HashedNets/HasherME.lua" and get get "libhashnn.mysort()" has problem
libhashnn.myreduce(self.sort_key_W,self.gradOBuffer,self.unique_idxW,self.gradInput,self.buffer_W)
Always getting problem in the "libhashnn". Does anyone have any advice on how I can try to further determine the problem?