Element-Research / rnn

Recurrent Neural Network library for Torch7's nn
BSD 3-Clause "New" or "Revised" License
939 stars 313 forks source link

Save and retrain a RAM model #360

Open matsaragas opened 7 years ago

matsaragas commented 7 years ago

Hi all,

I train a RAM model and after some epochs I save it. When I am trying to load the save mode and start training again I have the following error during the back-propagation:

/home/py13/torch/install/bin/luajit: /home/py13/torch/install/share/lua/5.1/nn/Container.lua:67: In 1 module of nn.Sequential: In 1 module of nn.Sequential: In 1 module of nn.ParallelTable: In 1 module of nn.Sequential: In 2 module of nn.ConcatTable: In 1 module of nn.Sequential: /home/py13/torch/install/share/lua/5.1/dpnn/DontCast.lua:12: bad argument #1 to 'getmetatable' (string expected, got nil) stack traceback: [C]: in function 'getmetatable' /home/py13/torch/install/share/lua/5.1/dpnn/DontCast.lua:12: in function 'recursiveTypeCopy' /home/py13/torch/install/share/lua/5.1/dpnn/DontCast.lua:9: in function 'recursiveTypeCopy' /home/py13/torch/install/share/lua/5.1/dpnn/DontCast.lua:86: in function </home/py13/torch/install/share/lua/5.1/dpnn/DontCast.lua:74> [C]: in function 'xpcall' /home/py13/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors' /home/py13/torch/install/share/lua/5.1/nn/Sequential.lua:58: in function </home/py13/torch/install/share/lua/5.1/nn/Sequential.lua:50> [C]: in function 'xpcall' /home/py13/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors' /home/py13/torch/install/share/lua/5.1/nn/ConcatTable.lua:35: in function </home/py13/torch/install/share/lua/5.1/nn/ConcatTable.lua:30> ... /home/py13/torch/install/share/lua/5.1/nn/Sequential.lua:58: in function 'updateGradInput' /home/py13/torch/install/share/lua/5.1/dpnn/Decorator.lua:16: in function 'updateGradInput' /home/py13/torch/install/share/lua/5.1/nn/Module.lua:31: in function 'backward' train.lua:91: in function 'opfunc' /home/py13/torch/install/share/lua/5.1/optim/sgd.lua:44: in function 'optimMethod' train.lua:115: in function 'trainOptim' main.lua:209: in main chunk [C]: in function 'dofile' ...py13/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk [C]: at 0x00406670

WARNING: If you see a stack trace below, it doesn't point to the place where this error occurred. Please use only the one above. stack traceback: [C]: in function 'error' /home/py13/torch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors' /home/py13/torch/install/share/lua/5.1/nn/Sequential.lua:58: in function 'updateGradInput' /home/py13/torch/install/share/lua/5.1/dpnn/Decorator.lua:16: in function 'updateGradInput' /home/py13/torch/install/share/lua/5.1/nn/Module.lua:31: in function 'backward' train.lua:91: in function 'opfunc' /home/py13/torch/install/share/lua/5.1/optim/sgd.lua:44: in function 'optimMethod' train.lua:115: in function 'trainOptim' main.lua:209: in main chunk [C]: in function 'dofile' ...py13/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk [C]: at 0x00406670

I serialize the model because its size is quite big. agent = nn.Serial(agent) agent:mediumSerial()

Any hint of how to solve this problem?

Thanks, Petros.