训练非常的慢，有什么好建议？

Matrixio commented 6 years ago

加scnn之后，训练非常慢。可能是因为scnn类似于rnn的原因。之前1小时能处理20000张图，训练完一轮，现在至少慢了6--7倍。

cardwing commented 6 years ago

@Matrixio , it is normal since SCNN propagates the information in a sequential way. As to the acceleration of training process, a naive way is to use multiple gpus. An alternative solution is to use a more efficient and light-weight model (e.g., change VGG-16 to ResNet-18).

cardwing commented 6 years ago

@Matrixio , you can refer to Codes-for-Lane-Detection where I will put my implemented version of lane detection models.

planckztd commented 6 years ago

@cardwing @XingangPan 您好，我在测试和训练过程中都出现了如下的错误，想问一下是不是cudnn的版本问题，我的cuda是８．０，cudnn　v4 data created
data loaded data loaded 1
bad /home/kb457/torch/install/bin/luajit: /home/kb457/torch/install/share/lua/5.1/nn/Container.lua:67: In 2 module of nn.Sequential: ...torch/install/share/lua/5.1/cudnn/BatchNormalization.lua:69: bad argument #1 to 'resizeAs' (torch.CudaTensor expected, got userdata) stack traceback: [C]: in function 'resizeAs' ...torch/install/share/lua/5.1/cudnn/BatchNormalization.lua:69: in function <...torch/install/share/lua/5.1/cudnn/BatchNormalization.lua:63> [C]: in function 'xpcall' /home/kb457/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors' /home/kb457/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward' testLane.lua:72: in main chunk [C]: in function 'dofile' ...b457/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: at 0x00405de0

WARNING: If you see a stack trace below, it doesn't point to the place where this error occurred. Please use only the one above. stack traceback: [C]: in function 'error' /home/kb457/torch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors' /home/kb457/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward' testLane.lua:72: in main chunk [C]: in function 'dofile' ...b457/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: at 0x00405de0 非常感谢，已经因为这个问题纠结几天了

cardwing commented 6 years ago

@planckztd , please refer to this issue. The error should be caused by the version of cudnn. You just need to upgrade cudnn from 4.0 to 5.0.

cardwing commented 6 years ago

@planckztd , you can also refer to this repo which is a bit faster.

XingangPan / SCNN

训练非常的慢，有什么好建议？ #52