HyeonwooNoh / DPPnet

DPPnet: Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction
Other
95 stars 45 forks source link

An illegal memory access was encountered #9

Open badripatro opened 7 years ago

badripatro commented 7 years ago

Problem Statement: When I am running the following command

_th vqatrain.lua -gpuid 1

I get the following message :


loading cache..: /home1/badri/badripatro/VQA/workspace_project/image_qa_dpp/DPPnet-master/004_train_DPPnet_fixed_cnn/cache/vqa_data_cache_major_test-dev2015_54 done creating a neural network with random initialization

/home/cse/torch/install/bin/luajit: C++ exception badri@cse-desktop:/DPPnet-master/004_train_DPPnet_fixed_cnn$


Also, I have narrowed it down to the line 79 of file "DPPnet-master_1/model/HashedNets/HasherME.lua" and get get "libhashnn.mysort()" has problem

_libhashnn.mysort(self['sortkey' .. WorB],self['sortval'.. WorB])_

Then I have commented the line -79, and complied again

_th vqatrain.lua -gpuid 1

I get the following message :


loading cache..: /home1/badri/badripatro/VQA/workspace_project/image_qa_dpp/DPPnet-master/004_train_DPPnet_fixed_cnn/cache/vqa_data_cache_major_test-dev2015_54 done
creating a neural network with random initialization
initialing weights..
[train2014val2014] set batch order option 1 : shuffle __
THCudaCheck FAIL file=/home1/badri/torch/extra/cutorch/lib/THC/generic/THCStorage.c line=147 error=77 : an illegal memory access was encountered /home1/badri/torch/install/bin/luajit: cuda runtime error (77) : an illegal memory access was encountered at /home1/badri/torch/extra/cutorch/lib/THC/generic/THCStorage.c:147


I have narrowed this problem down to the line 423 of file

004_train_DPPnet_fixed_cnn/vqa_train.lua

  **dlinear_out[i] = HasherME:backward(dhashed_out)**

Still on more debug, find in line no 114 of file "DPPnet-master_1/model/HashedNets/HasherME.lua" and get get "libhashnn.mysort()" has problem

libhashnn.myreduce(self.sort_key_W,self.gradOBuffer,self.unique_idxW,self.gradInput,self.buffer_W)

Always getting problem in the "libhashnn". Does anyone have any advice on how I can try to further determine the problem?