I modified the example of recurrent-language-model in rnn/examples/recurrent-language-model.lua to handle video data. In the example, the FastLSTM module is decorated with a Sequencer module. The input of my experiments is seqlength x batchSize x featureSize, where seqLength is set to 16 frames, batchSize is set to 5, and featureSize= channels x height x width of the images.
All code run in CPU mode.
Every time in my experiments, the program ran for the first epoch and failed near some batch of data. The error is:
**null
Stack trace:
And the locations where the error occurred are not the same but very close. Almost every time, the error occurred in nn.Linear module. I marked the locations where the error occured
nn/Linear.lua:
function Linear:updateOutput(input)
if input:dim() == 1 then
self.output:resize(self.weight:size(1))
if self.bias then self.output:copy(self.bias) else self.output:zero() end
self.output:addmv(1, self.weight, input)
elseif input:dim() == 2 then
local nframe = input:size(1)
local nElement = self.output:nElement()
self.output:resize(nframe, self.weight:size(1))
if self.output:nElement() ~= nElement then
self.output:zero()
end
updateAddBuffer(self, input)
self.output:addmm(0, self.output, 1, input, self.weight:t())
if self.bias then self.output:addr(1, self.addBuffer, self.bias) end --Sometimes Here
else
error('input must be vector or matrix')
end
return self.output
end
function Linear:updateGradInput(input, gradOutput)
if self.gradInput then
local nElement = self.gradInput:nElement()
self.gradInput:resizeAs(input)
if self.gradInput:nElement() ~= nElement then
self.gradInput:zero()
end
if input:dim() == 1 then
self.gradInput:addmv(0, 1, self.weight:t(), gradOutput)
elseif input:dim() == 2 then
self.gradInput:addmm(0, 1, gradOutput, self.weight) --sometimes here
end
return self.gradInput
end
end
function Linear:accGradParameters(input, gradOutput, scale)
scale = scale or 1
if input:dim() == 1 then
self.gradWeight:addr(scale, gradOutput, input)
if self.bias then self.gradBias:add(scale, gradOutput) end
elseif input:dim() == 2 then
self.gradWeight:addmm(scale, gradOutput:t(), input)
if self.bias then -- and sometimes here
-- update the size of addBuffer if the input is not the same size as the one we had in last updateGradInput
updateAddBuffer(self, input)
self.gradBias:addmv(scale, gradOutput:t(), self.addBuffer)
end
end
end
I met the nil error often but hardly meet this null error. To find out the reason, i write some code to simulate loading the video frame data. I got the same error as i use the true video data. Here is the full code
--testing the lstm model based on nn.FastLSTM but using simulated data
require 'paths'
require 'rnn'
version = 2
--[[ command line arguments ]]--
cmd = torch.CmdLine()
cmd:text()
cmd:text('Train a Language Model on PennTreeBank dataset using RNN or LSTM or GRU')
cmd:text('Example:')
cmd:text('th recurrent-language-model.lua --cuda --device 2 --progress --cutoff 4 --seqlen 10')
cmd:text("th recurrent-language-model.lua --progress --cuda --lstm --seqlen 20 --hiddensize '{200,200}' --batchsize 20 --startlr 1 --cutoff 5 --maxepoch 13 --schedule '{[5]=0.5,[6]=0.25,[7]=0.125,[8]=0.0625,[9]=0.03125,[10]=0.015625,[11]=0.0078125,[12]=0.00390625}'")
cmd:text("th recurrent-language-model.lua --progress --cuda --lstm --seqlen 35 --uniform 0.04 --hiddensize '{1500,1500}' --batchsize 20 --startlr 1 --cutoff 10 --maxepoch 50 --schedule '{[15]=0.87,[16]=0.76,[17]=0.66,[18]=0.54,[19]=0.43,[20]=0.32,[21]=0.21,[22]=0.10}' -dropout 0.65")
cmd:text('Options:')
--dataloader
cmd:option('--numClasses', 3) -- necessary
cmd:option('--scaledHeight', 192) -- video frame height
cmd:option('--scaledWidth', 208) -- video frame width
cmd:option('--numChannels', 3) --num of channels
-- training
cmd:option('--startlr', 0.0001, 'learning rate at t=0')
cmd:option('--minlr', 0.000001, 'minimum learning rate')
cmd:option('--saturate', 50, 'epoch at which linear decayed LR will reach minlr')
cmd:option('--schedule', '', 'learning rate schedule. e.g. {[5] = 0.004, [6] = 0.001}')
cmd:option('--momentum', 0.9, 'momentum')
cmd:option('--maxnormout', -1, 'max l2-norm of each layer\'s output neuron weights')
cmd:option('--cutoff', -1, 'max l2-norm of concatenation of all gradParam tensors')
cmd:option('--batchSize', 5, 'number of examples per batch')
cmd:option('--cuda', false, 'use CUDA')
cmd:option('--device', 1, 'sets the device (GPU) to use')
cmd:option('--maxepoch', 10, 'maximum number of epochs to run') ---1000 by default
cmd:option('--earlystop', 50, 'maximum number of epochs to wait to find a better local minima for early-stopping')
cmd:option('--progress', false, 'print progress bar')
cmd:option('--silent', false, 'don\'t print anything to stdout')
cmd:option('--uniform', 0.1, 'initialize parameters using uniform distribution between -uniform and uniform. -1 means default initialization')
-- rnn layer
cmd:option('--lstm', true, 'use Long Short Term Memory (nn.LSTM instead of nn.Recurrent)')
cmd:option('--bn', false, 'use batch normalization. Only supported with --lstm')
cmd:option('--gru', false, 'use Gated Recurrent Units (nn.GRU instead of nn.Recurrent)')
cmd:option('--seqLength', 16, 'sequence length : back-propagate through time (BPTT) for this many time-steps')
cmd:option('--inputsize', -1, 'size of lookup table embeddings. -1 defaults to hiddensize[1]')
cmd:option('--hiddensize', '{256}', 'number of hidden units used at output of each recurrent layer. When more than one is specified, RNN/LSTMs/GRUs are stacked')
cmd:option('--dropout', 0, 'apply dropout with this probability after each rnn layer. dropout <= 0 disables it.')
-- data
cmd:option('--batchsize', 5, 'number of examples per batch')
cmd:option('--trainsize', -1, 'number of train examples seen between each epoch')
cmd:option('--validsize', -1, 'number of valid examples used for early stopping and cross-validation')
cmd:option('--savepath', paths.concat('/home/path-to-save-the-model', 'rnnlm'), 'path to directory where experiment log (includes model) will be saved')
cmd:option('--id', 'simulateVideoProcessing', 'id string of this experiment (used to name output file) (defaults to a unique id)')
cmd:text()
local opt = cmd:parse(arg or {})
opt.hiddensize = loadstring(" return "..opt.hiddensize)()
opt.schedule = loadstring(" return "..opt.schedule)()
opt.inputsize = opt.numChannels_opt.scaledHeight_opt.scaledWidth--opt.inputsize == -1 and opt.hiddensize[1] or opt.inputsize
if not opt.silent then
print(opt)
end
if opt.cuda then
require 'cunn'
cutorch.setDevice(opt.device)
end
--[[lstm model based on FastLSTM and Sequencer--]]
local lm = nn.Sequential()
lm:add(nn.View(-1,opt.batchSize,opt.numChannels_opt.scaledHeight_opt.scaledWidth))
lm:add(nn.SplitTable(1)) -- tensor to table of tensors
-- rnn layers
local stepmodule = nn.Sequential() -- applied at each time-step
local inputsize = opt.inputsize
for i,hiddensize in ipairs(opt.hiddensize) do
local rnn
if opt.gru then -- Gated Recurrent Units
rnn = nn.GRU(inputsize, hiddensize, nil, opt.dropout/2)
elseif opt.lstm then -- Long Short Term Memory units
require 'nngraph'
nn.FastLSTM.usenngraph = true -- faster
nn.FastLSTM.bn = opt.bn
rnn = nn.FastLSTM(inputsize, hiddensize)
else -- simple recurrent neural network
local rm = nn.Sequential() -- input is {x[t], h[t-1]}
:add(nn.ParallelTable()
:add(i==1 and nn.Identity() or nn.Linear(inputsize, hiddensize)) -- input layer
:add(nn.Linear(hiddensize, hiddensize))) -- recurrent layer
:add(nn.CAddTable()) -- merge
:add(nn.Sigmoid()) -- transfer
rnn = nn.Recurrence(rm, hiddensize, 1)
end
stepmodule:add(rnn)
if opt.dropout > 0 then
stepmodule:add(nn.Dropout(opt.dropout))
end
inputsize = hiddensize
end
-- output layer
stepmodule:add(nn.Linear(inputsize, opt.numClasses))
stepmodule:add(nn.LogSoftMax())
-- encapsulate stepmodule into a Sequencer
lm:add(nn.Sequencer(stepmodule))
-- remember previous state between batches
lm:remember((opt.lstm or opt.gru) and 'both' or 'eval')
if not opt.silent then
print"Language Model:"
print(lm)
end
if opt.uniform > 0 then
for k,param in ipairs(lm:parameters()) do
param:uniform(-opt.uniform, opt.uniform)
end
end
--[[simulated data--]]
local simulated_Data = {}
if paths.filep('tdata.t7') then
simulated_Data.data = torch.load('tdata.t7')
else
simulated_Data.data = 20_torch.rand(opt.seqLength,opt.batchSize,
opt.numChannels,opt.scaledHeight,opt.scaledWidth)
torch.save('tdata.t7',simulated_Data.data)
end
simulated_Data.tindex = 1
local maxBatchTrain=175
--simulate training data
function getTrainData(opt)
local batch = {}
if simulated_Data.tindex <= maxBatchTrain then
batch.data = simulated_Data.data + (simulated_Data.tindex % 20)_0.01
local labels = torch.Tensor(opt.seqLength_opt.batchSize):fill(0)
for i=1,labels:size(1) do
labels[i]= (simulated_Data.tindex+i_i) % opt.numClasses +1
end
batch.labels = labels:clone():view(opt.seqLength,opt.batchSize)
simulated_Data.tindex = simulated_Data.tindex+1
return batch
else
simulated_Data.tindex = 1
return nil
end
end
local maxBatchVal=30
simulated_Data.vindex = 1
--simulate validate data
function getValData(opt)
local batch = {}
if simulated_Data.vindex <= maxBatchVal then
batch.data = simulated_Data.data + (simulated_Data.vindex % 10)_0.01
local labels = torch.Tensor(opt.seqLength_opt.batchSize):fill(0)
for i=1,labels:size(1) do
labels[i]= (simulated_Data.vindex+i*i) % opt.numClasses +1
end
batch.labels = labels:clone():view(opt.seqLength,opt.batchSize)
simulated_Data.vindex = simulated_Data.vindex+1
return batch
else
simulated_Data.vindex = 1
return nil
end
end
--[[simulate data end--]]
--[[ loss function ]]--
local crit = nn.ClassNLLCriterion()
-- target is also seqlen x batchsize.
local targetmodule = nn.SplitTable(1)
if opt.cuda then
targetmodule = nn.Sequential()
:add(nn.Convert())
:add(targetmodule)
end
local criterion = nn.SequencerCriterion(crit)
--[[ CUDA ]]--
if opt.cuda then
lm:cuda()
criterion:cuda()
targetmodule:cuda()
end
--[[ experiment log ]]--
-- is saved to file every time a new validation minima is found
local xplog = {}
xplog.opt = opt -- save all hyper-parameters and such
xplog.dataset = 'simulated'
xplog.model = nn.Serial(lm)
xplog.model:mediumSerial()
xplog.criterion = criterion
xplog.targetmodule = targetmodule
-- keep a log of NLL for each epoch
xplog.trainppl = {}
xplog.valppl = {}
-- will be used for early-stopping
xplog.minvalppl = 99999999
xplog.epoch = 0
local ntrial = 0
paths.mkdir(opt.savepath)
local epoch = 1
opt.lr = opt.startlr
opt.trainsize = 875 --simulate 175 batches with batch size 5
opt.validsize = 150 --simulate 30 batches with size 5
local params, gradParams = lm:getParameters()
while opt.maxepoch <= 0 or epoch <= opt.maxepoch do
print("")
print("Epoch #"..epoch.." :")
-- 1. training
local a = torch.Timer()
lm:training()
local sumErr = 0
local batch = getTrainData(opt)
local i = 1
while batch ~= nil do
print(string.format('%d\'th batch data min is %.08f max is %.08f',i,torch.min(batch.data),torch.max(batch.data)))
local targets = targetmodule:forward(batch.labels)
local inputs = batch.data
-- forward
local outputs = lm:forward(inputs)--size is
--print(outputs:dim())
local err = criterion:forward(outputs, targets)
sumErr = sumErr + err
-- backward
local gradOutputs = criterion:backward(outputs, targets)
lm:zeroGradParameters()
lm:backward(inputs, gradOutputs)
-- update
if opt.cutoff > 0 then
local norm = lm:gradParamClip(opt.cutoff) -- affects gradParams
opt.meanNorm = opt.meanNorm and (opt.meanNorm_0.9 + norm_0.1) or norm
end
lm:updateGradParameters(opt.momentum) -- affects gradParams --from dpnn
lm:updateParameters(opt.lr) -- affects params
lm:maxParamNorm(opt.maxnormout) -- affects params
if opt.progress then
xlua.progress(math.min(i + opt.seqLength, opt.trainsize), opt.trainsize)
end
--debug
print(string.format('max of params is %.10f, min of params is %.10f,max grad is %.10f min grad is %.10f',
torch.max(params),torch.min(params),torch.max(gradParams),torch.min(gradParams)))
if i % 100 == 0 then
collectgarbage()
end
batch = getTrainData(opt)
i = i+1
end
-- learning rate decay
if opt.schedule then
opt.lr = opt.schedule[epoch] or opt.lr
else
opt.lr = opt.lr + (opt.minlr - opt.startlr)/opt.saturate
end
opt.lr = math.max(opt.minlr, opt.lr)
if not opt.silent then
print("learning rate", opt.lr)
if opt.meanNorm then
print("mean gradParam norm", opt.meanNorm)
end
end
if cutorch then cutorch.synchronize() end
local speed = a:time().real/opt.trainsize
print(string.format("Speed : %f sec/batch ", speed))
-- 2. cross-validation
lm:evaluate()
local sumErr = 0
local vBatch = getValData(opt)
while vBatch~=nil do
local targets = targetmodule:forward(vBatch.labels)
local inputs = vBatch.data
local outputs = lm:forward(inputs)
local err = criterion:forward(outputs, targets)
sumErr = sumErr + err
vBatch = getValData(opt)
end
local ppl = torch.exp(sumErr/opt.validsize)
-- Note :
-- Perplexity = exp( sum ( NLL ) / #w)
-- Bits Per Word = log2(Perplexity)
-- Bits per Char = BPW * (#w / #c)
print("Validation PPL : "..ppl)
xplog.valppl[epoch] = ppl
ntrial = ntrial + 1
-- early-stopping
if ppl < xplog.minvalppl then
-- save best version of model
xplog.minvalppl = ppl
xplog.epoch = epoch
local filename = paths.concat(opt.savepath, opt.id..'.t7')
print("Found new minima. Saving to "..filename)
torch.save(filename, xplog)
ntrial = 0
elseif ntrial >= opt.earlystop then
print("No new minima found after "..ntrial.." epochs.")
print("Stopping experiment.")
break
end
collectgarbage()
epoch = epoch + 1
end
print("Evaluate model using : ")
print("th scripts/evaluate-rnnlm.lua --xplogpath "..paths.concat(opt.savepath, opt.id..'.t7')..(opt.cuda and ' --cuda' or ''))
I have spent quite a long time to fix out this but still failed. Can someone help me figure this problem?
THanks a lot
Hi guys,
I modified the example of recurrent-language-model in rnn/examples/recurrent-language-model.lua to handle video data. In the example, the FastLSTM module is decorated with a Sequencer module. The input of my experiments is seqlength x batchSize x featureSize, where seqLength is set to 16 frames, batchSize is set to 5, and featureSize= channels x height x width of the images. All code run in CPU mode.
Every time in my experiments, the program ran for the first epoch and failed near some batch of data. The error is: **null Stack trace:
0 file:/home/fanxiang/workspace/myLSTMActionRecog/src/Linear.lua [67]
And the locations where the error occurred are not the same but very close. Almost every time, the error occurred in nn.Linear module. I marked the locations where the error occured nn/Linear.lua:
function Linear:updateOutput(input) if input:dim() == 1 then self.output:resize(self.weight:size(1)) if self.bias then self.output:copy(self.bias) else self.output:zero() end self.output:addmv(1, self.weight, input) elseif input:dim() == 2 then local nframe = input:size(1) local nElement = self.output:nElement() self.output:resize(nframe, self.weight:size(1)) if self.output:nElement() ~= nElement then self.output:zero() end updateAddBuffer(self, input) self.output:addmm(0, self.output, 1, input, self.weight:t()) if self.bias then self.output:addr(1, self.addBuffer, self.bias) end --Sometimes Here else error('input must be vector or matrix') end
return self.output end
function Linear:updateGradInput(input, gradOutput) if self.gradInput then local nElement = self.gradInput:nElement() self.gradInput:resizeAs(input) if self.gradInput:nElement() ~= nElement then self.gradInput:zero() end if input:dim() == 1 then self.gradInput:addmv(0, 1, self.weight:t(), gradOutput) elseif input:dim() == 2 then self.gradInput:addmm(0, 1, gradOutput, self.weight) --sometimes here end return self.gradInput end end
function Linear:accGradParameters(input, gradOutput, scale) scale = scale or 1 if input:dim() == 1 then self.gradWeight:addr(scale, gradOutput, input) if self.bias then self.gradBias:add(scale, gradOutput) end elseif input:dim() == 2 then self.gradWeight:addmm(scale, gradOutput:t(), input) if self.bias then -- and sometimes here -- update the size of addBuffer if the input is not the same size as the one we had in last updateGradInput updateAddBuffer(self, input) self.gradBias:addmv(scale, gradOutput:t(), self.addBuffer) end end end
I met the nil error often but hardly meet this null error. To find out the reason, i write some code to simulate loading the video frame data. I got the same error as i use the true video data. Here is the full code
--testing the lstm model based on nn.FastLSTM but using simulated data require 'paths' require 'rnn' version = 2
--[[ command line arguments ]]-- cmd = torch.CmdLine() cmd:text() cmd:text('Train a Language Model on PennTreeBank dataset using RNN or LSTM or GRU') cmd:text('Example:') cmd:text('th recurrent-language-model.lua --cuda --device 2 --progress --cutoff 4 --seqlen 10') cmd:text("th recurrent-language-model.lua --progress --cuda --lstm --seqlen 20 --hiddensize '{200,200}' --batchsize 20 --startlr 1 --cutoff 5 --maxepoch 13 --schedule '{[5]=0.5,[6]=0.25,[7]=0.125,[8]=0.0625,[9]=0.03125,[10]=0.015625,[11]=0.0078125,[12]=0.00390625}'") cmd:text("th recurrent-language-model.lua --progress --cuda --lstm --seqlen 35 --uniform 0.04 --hiddensize '{1500,1500}' --batchsize 20 --startlr 1 --cutoff 10 --maxepoch 50 --schedule '{[15]=0.87,[16]=0.76,[17]=0.66,[18]=0.54,[19]=0.43,[20]=0.32,[21]=0.21,[22]=0.10}' -dropout 0.65") cmd:text('Options:') --dataloader cmd:option('--numClasses', 3) -- necessary cmd:option('--scaledHeight', 192) -- video frame height cmd:option('--scaledWidth', 208) -- video frame width cmd:option('--numChannels', 3) --num of channels -- training cmd:option('--startlr', 0.0001, 'learning rate at t=0') cmd:option('--minlr', 0.000001, 'minimum learning rate') cmd:option('--saturate', 50, 'epoch at which linear decayed LR will reach minlr') cmd:option('--schedule', '', 'learning rate schedule. e.g. {[5] = 0.004, [6] = 0.001}') cmd:option('--momentum', 0.9, 'momentum') cmd:option('--maxnormout', -1, 'max l2-norm of each layer\'s output neuron weights') cmd:option('--cutoff', -1, 'max l2-norm of concatenation of all gradParam tensors') cmd:option('--batchSize', 5, 'number of examples per batch') cmd:option('--cuda', false, 'use CUDA') cmd:option('--device', 1, 'sets the device (GPU) to use') cmd:option('--maxepoch', 10, 'maximum number of epochs to run') ---1000 by default cmd:option('--earlystop', 50, 'maximum number of epochs to wait to find a better local minima for early-stopping') cmd:option('--progress', false, 'print progress bar') cmd:option('--silent', false, 'don\'t print anything to stdout') cmd:option('--uniform', 0.1, 'initialize parameters using uniform distribution between -uniform and uniform. -1 means default initialization') -- rnn layer cmd:option('--lstm', true, 'use Long Short Term Memory (nn.LSTM instead of nn.Recurrent)') cmd:option('--bn', false, 'use batch normalization. Only supported with --lstm') cmd:option('--gru', false, 'use Gated Recurrent Units (nn.GRU instead of nn.Recurrent)') cmd:option('--seqLength', 16, 'sequence length : back-propagate through time (BPTT) for this many time-steps') cmd:option('--inputsize', -1, 'size of lookup table embeddings. -1 defaults to hiddensize[1]') cmd:option('--hiddensize', '{256}', 'number of hidden units used at output of each recurrent layer. When more than one is specified, RNN/LSTMs/GRUs are stacked') cmd:option('--dropout', 0, 'apply dropout with this probability after each rnn layer. dropout <= 0 disables it.') -- data cmd:option('--batchsize', 5, 'number of examples per batch') cmd:option('--trainsize', -1, 'number of train examples seen between each epoch') cmd:option('--validsize', -1, 'number of valid examples used for early stopping and cross-validation') cmd:option('--savepath', paths.concat('/home/path-to-save-the-model', 'rnnlm'), 'path to directory where experiment log (includes model) will be saved') cmd:option('--id', 'simulateVideoProcessing', 'id string of this experiment (used to name output file) (defaults to a unique id)')
cmd:text() local opt = cmd:parse(arg or {}) opt.hiddensize = loadstring(" return "..opt.hiddensize)() opt.schedule = loadstring(" return "..opt.schedule)() opt.inputsize = opt.numChannels_opt.scaledHeight_opt.scaledWidth--opt.inputsize == -1 and opt.hiddensize[1] or opt.inputsize if not opt.silent then print(opt) end
if opt.cuda then require 'cunn' cutorch.setDevice(opt.device) end
--[[lstm model based on FastLSTM and Sequencer--]] local lm = nn.Sequential() lm:add(nn.View(-1,opt.batchSize,opt.numChannels_opt.scaledHeight_opt.scaledWidth)) lm:add(nn.SplitTable(1)) -- tensor to table of tensors -- rnn layers local stepmodule = nn.Sequential() -- applied at each time-step local inputsize = opt.inputsize for i,hiddensize in ipairs(opt.hiddensize) do local rnn if opt.gru then -- Gated Recurrent Units rnn = nn.GRU(inputsize, hiddensize, nil, opt.dropout/2) elseif opt.lstm then -- Long Short Term Memory units require 'nngraph' nn.FastLSTM.usenngraph = true -- faster nn.FastLSTM.bn = opt.bn rnn = nn.FastLSTM(inputsize, hiddensize) else -- simple recurrent neural network local rm = nn.Sequential() -- input is {x[t], h[t-1]} :add(nn.ParallelTable() :add(i==1 and nn.Identity() or nn.Linear(inputsize, hiddensize)) -- input layer :add(nn.Linear(hiddensize, hiddensize))) -- recurrent layer :add(nn.CAddTable()) -- merge :add(nn.Sigmoid()) -- transfer rnn = nn.Recurrence(rm, hiddensize, 1) end stepmodule:add(rnn) if opt.dropout > 0 then stepmodule:add(nn.Dropout(opt.dropout)) end
inputsize = hiddensize end -- output layer stepmodule:add(nn.Linear(inputsize, opt.numClasses)) stepmodule:add(nn.LogSoftMax()) -- encapsulate stepmodule into a Sequencer lm:add(nn.Sequencer(stepmodule)) -- remember previous state between batches lm:remember((opt.lstm or opt.gru) and 'both' or 'eval') if not opt.silent then print"Language Model:" print(lm) end if opt.uniform > 0 then for k,param in ipairs(lm:parameters()) do param:uniform(-opt.uniform, opt.uniform) end end
--[[simulated data--]] local simulated_Data = {} if paths.filep('tdata.t7') then simulated_Data.data = torch.load('tdata.t7') else simulated_Data.data = 20_torch.rand(opt.seqLength,opt.batchSize, opt.numChannels,opt.scaledHeight,opt.scaledWidth) torch.save('tdata.t7',simulated_Data.data) end simulated_Data.tindex = 1 local maxBatchTrain=175 --simulate training data function getTrainData(opt) local batch = {} if simulated_Data.tindex <= maxBatchTrain then batch.data = simulated_Data.data + (simulated_Data.tindex % 20)_0.01 local labels = torch.Tensor(opt.seqLength_opt.batchSize):fill(0) for i=1,labels:size(1) do labels[i]= (simulated_Data.tindex+i_i) % opt.numClasses +1 end batch.labels = labels:clone():view(opt.seqLength,opt.batchSize) simulated_Data.tindex = simulated_Data.tindex+1 return batch else simulated_Data.tindex = 1 return nil end
end
local maxBatchVal=30 simulated_Data.vindex = 1 --simulate validate data function getValData(opt) local batch = {} if simulated_Data.vindex <= maxBatchVal then batch.data = simulated_Data.data + (simulated_Data.vindex % 10)_0.01 local labels = torch.Tensor(opt.seqLength_opt.batchSize):fill(0) for i=1,labels:size(1) do labels[i]= (simulated_Data.vindex+i*i) % opt.numClasses +1 end batch.labels = labels:clone():view(opt.seqLength,opt.batchSize) simulated_Data.vindex = simulated_Data.vindex+1 return batch else simulated_Data.vindex = 1 return nil end end --[[simulate data end--]]
--[[ loss function ]]-- local crit = nn.ClassNLLCriterion() -- target is also seqlen x batchsize. local targetmodule = nn.SplitTable(1) if opt.cuda then targetmodule = nn.Sequential() :add(nn.Convert()) :add(targetmodule) end local criterion = nn.SequencerCriterion(crit)
--[[ CUDA ]]-- if opt.cuda then lm:cuda() criterion:cuda() targetmodule:cuda() end
--[[ experiment log ]]-- -- is saved to file every time a new validation minima is found local xplog = {} xplog.opt = opt -- save all hyper-parameters and such xplog.dataset = 'simulated' xplog.model = nn.Serial(lm) xplog.model:mediumSerial() xplog.criterion = criterion xplog.targetmodule = targetmodule -- keep a log of NLL for each epoch xplog.trainppl = {} xplog.valppl = {} -- will be used for early-stopping xplog.minvalppl = 99999999 xplog.epoch = 0 local ntrial = 0 paths.mkdir(opt.savepath)
local epoch = 1 opt.lr = opt.startlr opt.trainsize = 875 --simulate 175 batches with batch size 5 opt.validsize = 150 --simulate 30 batches with size 5 local params, gradParams = lm:getParameters() while opt.maxepoch <= 0 or epoch <= opt.maxepoch do print("") print("Epoch #"..epoch.." :")
-- 1. training local a = torch.Timer() lm:training() local sumErr = 0
local batch = getTrainData(opt) local i = 1 while batch ~= nil do print(string.format('%d\'th batch data min is %.08f max is %.08f',i,torch.min(batch.data),torch.max(batch.data))) local targets = targetmodule:forward(batch.labels) local inputs = batch.data
-- forward local outputs = lm:forward(inputs)--size is --print(outputs:dim())
local err = criterion:forward(outputs, targets) sumErr = sumErr + err
-- backward local gradOutputs = criterion:backward(outputs, targets) lm:zeroGradParameters() lm:backward(inputs, gradOutputs) -- update if opt.cutoff > 0 then local norm = lm:gradParamClip(opt.cutoff) -- affects gradParams opt.meanNorm = opt.meanNorm and (opt.meanNorm_0.9 + norm_0.1) or norm end lm:updateGradParameters(opt.momentum) -- affects gradParams --from dpnn lm:updateParameters(opt.lr) -- affects params lm:maxParamNorm(opt.maxnormout) -- affects params
end
-- learning rate decay if opt.schedule then opt.lr = opt.schedule[epoch] or opt.lr else opt.lr = opt.lr + (opt.minlr - opt.startlr)/opt.saturate end opt.lr = math.max(opt.minlr, opt.lr)
if not opt.silent then print("learning rate", opt.lr) if opt.meanNorm then print("mean gradParam norm", opt.meanNorm) end end
if cutorch then cutorch.synchronize() end local speed = a:time().real/opt.trainsize print(string.format("Speed : %f sec/batch ", speed))
local ppl = torch.exp(sumErr/opt.trainsize) print("Training PPL : "..ppl) xplog.trainppl[epoch] = ppl
-- 2. cross-validation lm:evaluate() local sumErr = 0 local vBatch = getValData(opt) while vBatch~=nil do local targets = targetmodule:forward(vBatch.labels) local inputs = vBatch.data local outputs = lm:forward(inputs) local err = criterion:forward(outputs, targets) sumErr = sumErr + err
end
local ppl = torch.exp(sumErr/opt.validsize) -- Note : -- Perplexity = exp( sum ( NLL ) / #w) -- Bits Per Word = log2(Perplexity) -- Bits per Char = BPW * (#w / #c) print("Validation PPL : "..ppl)
xplog.valppl[epoch] = ppl ntrial = ntrial + 1
-- early-stopping if ppl < xplog.minvalppl then -- save best version of model xplog.minvalppl = ppl xplog.epoch = epoch local filename = paths.concat(opt.savepath, opt.id..'.t7') print("Found new minima. Saving to "..filename) torch.save(filename, xplog) ntrial = 0 elseif ntrial >= opt.earlystop then print("No new minima found after "..ntrial.." epochs.") print("Stopping experiment.") break end
collectgarbage() epoch = epoch + 1 end print("Evaluate model using : ") print("th scripts/evaluate-rnnlm.lua --xplogpath "..paths.concat(opt.savepath, opt.id..'.t7')..(opt.cuda and ' --cuda' or ''))
I have spent quite a long time to fix out this but still failed. Can someone help me figure this problem? THanks a lot