facebookresearch / multipathnet

A Torch implementation of the object detection network from "A MultiPath Network for Object Detection" (https://arxiv.org/abs/1604.02135)
Other
1.34k stars 275 forks source link

How to resume from a checkpoint? #13

Closed northeastsquare closed 8 years ago

northeastsquare commented 8 years ago

Hello, I want to resume training from a checkpoint, tried to set opt.checkpoint=true, then I got error:

/root/torch/install/bin/luajit: train.lua:227: attempt to index global 'checkpoint' (a nil value) stack traceback: train.lua:227: in function 'hooks' ./engines/fboptimengine.lua:50: in function 'train' train.lua:363: in main chunk [C]: in function 'dofile' /root/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk [C]: at 0x004064f0

szagoruyko commented 8 years ago

not sure checkpointing works correctly right now, please use retrain option while we fix checkpointing.

northeastsquare commented 8 years ago

@szagoruyko , Thank you I notice in logs dir, there is : transformer.t7,optimState_500.t7,model_500.t7

So I set : retrain=model_500.t7 transformer=transformer.t7

But where to set optimState_500.t7?

szagoruyko commented 8 years ago

@northeastsquare there is no option, momentum will be reset.

northeastsquare commented 8 years ago

@szagoruyko OK