Closed guoyan1991 closed 6 years ago
I set up opt.conditional=false
so model =nn.gModule({input},{reconstruction,mean,log_var})
and the error is happened in this sentence:
reconstruction,mean,log_var,predClassSores=unpack(model:forward(droppedInputs))
I tried to debug the code. And I found that the main problem appeared in the section:model:forward(droppedInputs)
‘droppedInputs’ is a 1×20×224×224 torch.CudaTensor
I don't know what the correct input for this nn.gModule.
Thanks for your help.
@guoyan1991 I'm not sure what you mean by the .mat files. We did not use any .mat files for this work except when we were processing the NYUD data set to get the objects (chairs) out. Could you elaborate on this?
Regarding the error you're getting: I'm afraid you have modified the code because I do not see any code at lines 300 and or 406. Based on your second post, I guess the problem is you need to feed more than 1 sample at a time to the network. The BatchNormalization layers expect 4D tensors [N x C x R x R]
where N
is the number of 3D shapes, C
is the number of channels (20
here) and R
is the resolution. Let met know if this can resolve the issue. So you need to at least input two samples to the network such that N >= 2
First of all, thank you very much for your reply. I download the ShapeNet Core dataset from website: https://www.shapenet.org/ , I gusse the dataset of PASCAL 3D release 1.0 can be use to repeat your experiments. However, the CAD Models in compressed files all are .mat file. It can only be opened with matlab. I can't use /renderDepth/runRendering.bat to get 20 depth maps with these .mat files. Because these files are not object files. Is it the right dataset which I download ? This question confused me,looking forward to your answer.
For the error, thank you for your suggestion. I will continue to try and give you a reply soon.
@guoyan1991 We did not use the PASCAL 3D data set so I am not sure how you can use that. To use the rendering tool we have provided, you need to have access to the .ply
files of the 3D meshes. If you want to use ShapeNet Core you can download the pre-processed data set through the links provided in the repository so that you can skip the rendering part (unless you want to render from views different than what we used).
Thank you very much. I have used the two models of the preprocessed data set you provided to training. The two problems have been solved. But there are new problems as follow:
cudnnFindConvolutionBackwardDataAlgorithm failed: 2 convDesc=[mode : CUDNN_CROSS_CORRELATION datatype : CUDNN_DATA_FLOAT] hash=-dimA4,420,56,56 -filtA420,280,4,4 4,280,112,112 -padA1,1 -convStrideA2,2 CUDNN_DATA_FLOAT
/install/torch/install/bin/luajit: /install/torch/install/share/lua/5.1/nn/Container.lua:67:
In 12 module of nn.Sequential:
/install/torch/install/share/lua/5.1/cudnn/find.lua:483: cudnnFindConvolutionBackwardDataAlgorithm failed, sizes: convDesc=[mode : CUDNN_CROSS_CORRELATION datatype : CUDNN_DATA_FLOAT] hash=-dimA4,420,56,56 -filtA420,280,4,4 4,280,112,112 -padA1,1 -convStrideA2,2 CUDNN_DATA_FLOAT
stack traceback:
[C]: in function 'error'
/install/torch/install/share/lua/5.1/cudnn/find.lua:483: in function 'backwardDataAlgorithm'
...h/install/share/lua/5.1/cudnn/SpatialFullConvolution.lua:88: in function <...h/install/share/lua/5.1/cudnn/SpatialFullConvolution.lua:83>
[C]: in function 'xpcall'
/install/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
/install/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'func'
/install/torch/install/share/lua/5.1/nngraph/gmodule.lua:345: in function 'neteval'
/install/torch/install/share/lua/5.1/nngraph/gmodule.lua:380: in function 'forward'
2_train.lua:289: in function 'opfunc'
/install/torch/install/share/lua/5.1/optim/adam.lua:37: in function 'adam'
2_train.lua:377: in main chunk
[C]: in function 'dofile'
main.lua:130: in main chunk
[C]: in function 'dofile'
...tall/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x00405e90
WARNING: If you see a stack trace below, it doesn't point to the place where this error occurred. Please use only the one above. stack traceback: [C]: in function 'error' /install/torch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors' /install/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'func' /install/torch/install/share/lua/5.1/nngraph/gmodule.lua:345: in function 'neteval' /install/torch/install/share/lua/5.1/nngraph/gmodule.lua:380: in function 'forward' 2_train.lua:289: in function 'opfunc' /install/torch/install/share/lua/5.1/optim/adam.lua:37: in function 'adam' 2_train.lua:377: in main chunk [C]: in function 'dofile' main.lua:130: in main chunk [C]: in function 'dofile' ...tall/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: at 0x00405e90
I used the original version of the code this time, The code for line 289 of 2_train.lua is: reconstruction,mean,log_var,predClassSores=unpack(model:forward(droppedInputs)) I did not find out the reasons and solutions about this problem on the Internet. Thanks for your help!
@guoyan1991 I'm not sure why you are getting this error but it seems that cuDNN is complaining. What's the cuDNN version you are using? I just started re-training a model with cuDNN 7.05
and CUDA 8.0
and it works fine.
Thank you very much for your help. This problem is caused by memory of GPU.
The memory of GPU is not enough.
@guoyan1991 You can change the argument opt.nCh
and set it to lower values or reduce opt.batchSize
. If I remember correctly you need about 6-7GBs or GPU memory with the default parameters.
I have two questions and I really hope to get your help.
First, I have a big problem with training data. I want to use the ShapeNet Core dataset to repeat your experiments. So I'm going to convert the .mat file into the .ply file.I've found that I can copy the vertex array and faces array directly from the mat file into the ply file. But this approach is too complex. Do you have some simpler ways to do it?
Second, I find the following problems when I training the network of AllVP with a 3D shape(only depths from 20 views). I don't know if it's due to my incorrect input.
/install/torch/install/bin/luajit: /install/torch/install/share/lua/5.1/nn/Container.lua:67: In 3 module of nn.Sequential: ...torch/install/share/lua/5.1/cudnn/BatchNormalization.lua:44: assertion failed! stack traceback: [C]: in function 'assert' ...torch/install/share/lua/5.1/cudnn/BatchNormalization.lua:44: in function 'createIODescriptors' ...torch/install/share/lua/5.1/cudnn/BatchNormalization.lua:60: in function <...torch/install/share/lua/5.1/cudnn/BatchNormalization.lua:59> [C]: in function 'xpcall' /install/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors' /install/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'func' /install/torch/install/share/lua/5.1/nngraph/gmodule.lua:345: in function 'neteval' /install/torch/install/share/lua/5.1/nngraph/gmodule.lua:380: in function 'forward' 2_train.lua:300: in function 'opfunc' /install/torch/install/share/lua/5.1/optim/adam.lua:37: in function 'adam' 2_train.lua:406: in main chunk [C]: in function 'dofile' main.lua:130: in main chunk [C]: in function 'dofile' ...tall/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: at 0x00405e90
Thanks for your help.