Open Matrixio opened 6 years ago
@Matrixio , it is normal since SCNN propagates the information in a sequential way. As to the acceleration of training process, a naive way is to use multiple gpus. An alternative solution is to use a more efficient and light-weight model (e.g., change VGG-16 to ResNet-18).
@Matrixio , you can refer to Codes-for-Lane-Detection where I will put my implemented version of lane detection models.
@cardwing @XingangPan 您好,我在测试和训练过程中都出现了如下的错误,想问一下是不是cudnn的版本问题,我的cuda是8.0,cudnn v4
data created
data loaded
data loaded
1
bad
/home/kb457/torch/install/bin/luajit: /home/kb457/torch/install/share/lua/5.1/nn/Container.lua:67:
In 2 module of nn.Sequential:
...torch/install/share/lua/5.1/cudnn/BatchNormalization.lua:69: bad argument #1 to 'resizeAs' (torch.CudaTensor expected, got userdata)
stack traceback:
[C]: in function 'resizeAs'
...torch/install/share/lua/5.1/cudnn/BatchNormalization.lua:69: in function <...torch/install/share/lua/5.1/cudnn/BatchNormalization.lua:63>
[C]: in function 'xpcall'
/home/kb457/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
/home/kb457/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
testLane.lua:72: in main chunk
[C]: in function 'dofile'
...b457/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x00405de0
WARNING: If you see a stack trace below, it doesn't point to the place where this error occurred. Please use only the one above. stack traceback: [C]: in function 'error' /home/kb457/torch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors' /home/kb457/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward' testLane.lua:72: in main chunk [C]: in function 'dofile' ...b457/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: at 0x00405de0 非常感谢,已经因为这个问题纠结几天了
@planckztd , please refer to this issue. The error should be caused by the version of cudnn. You just need to upgrade cudnn from 4.0 to 5.0.
加scnn之后,训练非常慢。可能是因为scnn类似于rnn的原因。之前1小时能处理20000张图,训练完一轮,现在至少慢了6--7倍。