Open hsheil opened 8 years ago
Hi @hsheil. I like where this is going. I am reproducing your code here :
require 'rnn'
function build_network(inputSize, hiddenSize, outputSize)
-- I1: add in a dropout layer
rnn = nn.Sequential()
:add(nn.Sequencer(nn.Linear(inputSize, hiddenSize)))
:add(nn.Sequencer(nn.LSTM(hiddenSize, hiddenSize)))
:add(nn.Sequencer(nn.LSTM(hiddenSize, hiddenSize)))
:add(nn.Sequencer(nn.Linear(hiddenSize, outputSize)))
:add(nn.Sequencer(nn.LogSoftMax()))
-- I1: Adding this line makes the loss oscillate a lot more during training, when according to
-- http://arxiv.org/abs/1409.2329 this should *help* model performance
-- A1: initialization often depends on each dataset.
--rnn:getParameters():uniform(-0.1, 0.1)
return rnn
end
-- Keep the input layer small so the model trains / converges quickly while training
local inputSize = 10
-- Most models seem to use 512 LSTM units in the hidden layers, so let's stick with this
local hiddenSize = 512
-- We want the network to classify the inputs using a one-hot representation of the outputs
local outputSize = 3
local rnn = build_network(inputSize, hiddenSize, outputSize)
--artificially small batchSize again for easy training
-- this can be the number of sequences to train on
local batchSize=5
-- the dataset size is the length of each of the batchSize sequences.
local dsSize=20
-- number of classes
local nClass = 10
inputs = {}
targets = {}
-- Build up our inputs and targets
-- I2, add code so that if --cuda supplied, these become CudaTensors
-- using the opt.XXX and 'require cunn'
-- I3 - replace this random data set with something more meaningful / learnable
-- and with a realistic testing and validation set
for i = 1, dsSize do
table.insert(inputs, torch.randn(batchSize,inputSize))
table.insert(targets, torch.LongTensor(batchSize):random(1,nClass))
end
-- Decorate the regular nn Criterion with a SequencerCriterion as this simplifies training quite a bit
seqC = nn.SequencerCriterion(nn.ClassNLLCriterion())
local count = 0
local numEpochs=100
local start = torch.tic()
--Now let's train our network on the small, fake dataset we generated earlier
while numEpochs ~= 0 do
rnn:training()
count = count + 1
out = rnn:forward(inputs) -- your are feeding batchSize sequences each of length dsSize steps
err = seqC:forward(out, targets)
gradOut = seqC:backward(out, targets)
rnn:backward(inputs, gradOut)
local currT = torch.toc(start)
print('loss', err .. ' in ', currT .. ' s')
--TODO, make this configurable / reduce over time as the model converges
rnn:updateParameters(0.05)
-- I5: Are these steps necessary? Seem to make no difference to convergence if called or not
-- Perhaps they are being called by
rnn:zeroGradParameters()
-- rnn:forget() dont need this as Sequencer handle it directly.
start = torch.tic()
-- I6: Make this configurable based on the convergence, so we keep going for bigger, more complex models until they are trained
-- to an acceptable accuracy
-- Also add in code to save out the model file to disk for evaluation / usage externally periodically
numEpochs = numEpochs - 1
end
So I modified a couple of things. In its current (above) form, each epoch the rnn sees the entirety of the dataset. For a real dataset, you would need to add another inner loop where your split the batchSize x dsSize data into chunks of smaller batches of sequences : batchSize x seqLength where seqLength << dsSize.
Thanks for presenting a good RNN example. One thing is not clear for me. What is the form of the input data to RNNs? I think that it should be like seqLength x batchSize x inputSize, is it right?
This is great, Thank You... Now to make it complete, it would be nice to see a Validate and Test section to bring this simple example to completion..
On Thu, Dec 24, 2015 at 3:11 AM, Jun Deng notifications@github.com wrote:
Thanks for presenting a good RNN example. One thing is not clear for me. What is the form of the input data to RNNs? I think that it should be like seqLength x batchSize x inputSize, is it right?
— Reply to this email directly or view it on GitHub https://github.com/Element-Research/rnn/issues/92#issuecomment-167065684 .
Hi @nicholas-leonard
Thanks for the reply and code. Apologies for the late reply - too much Christmas turkey and pudding!
I ran your example but it failed with "invalid arguments: DoubleTensor number DoubleTensor LongTensor expected arguments: DoubleTensor~1D [DoubleTensor~1D] [double] DoubleTensor~2D DoubleTensor~1D | DoubleTensor~1D double [DoubleTensor~1D] double DoubleTensor~2D DoubleTensor~1D stack traceback:"
Looking at line 45, I think you meant to populate the targets table instead of the inputs twice? I changed it to targets but then I get a different error: "lua/5.1/nn/ClassNLLCriterion.lua:46: Assertion `cur_target >= 0 && cur_target < n_classes' failed. " Printing out the targets table, the ranges look fine but I remember getting this error before when there was a mismatch between my target classes and output cells so when I change line 33 so that nClass == outputSize == 3, it all works and trains nicely!
I document these two issues here for other readers as the biggest thing I find slowing me down are (probably obvious) errors when I haven't wired things together correctly.. I think a Validator class would be a big help to pass models + desired data into and it would pass judgement on whether they will work or not (and over time, helpful hints on how to get them to work). Let me know if you agree with my changes..
For the interested reader, the following network works (make sure you get latest torch and "luarocks install" all the latest rocks so you don't get weird behaviour with mismatches - torch changes fast..).
I'll keep working on the tutorial and update progress here..
require 'rnn'
function build_network(inputSize, hiddenSize, outputSize)
-- I1: add in a dropout layer
rnn = nn.Sequential()
:add(nn.Sequencer(nn.Linear(inputSize, hiddenSize)))
:add(nn.Sequencer(nn.LSTM(hiddenSize, hiddenSize)))
:add(nn.Sequencer(nn.LSTM(hiddenSize, hiddenSize)))
:add(nn.Sequencer(nn.Linear(hiddenSize, outputSize)))
:add(nn.Sequencer(nn.LogSoftMax()))
-- I1: Adding this line makes the loss oscillate a lot more during training, when according to
-- http://arxiv.org/abs/1409.2329 this should *help* model performance
-- A1: initialization often depends on each dataset.
--rnn:getParameters():uniform(-0.1, 0.1)
return rnn
end
-- Keep the input layer small so the model trains / converges quickly while training
local inputSize = 10
-- Most models seem to use 512 LSTM units in the hidden layers, so let's stick with this
local hiddenSize = 512
-- We want the network to classify the inputs using a one-hot representation of the outputs
local outputSize = 3
local rnn = build_network(inputSize, hiddenSize, outputSize)
--artificially small batchSize again for easy training
-- this can be the number of sequences to train on
local batchSize=5
-- the dataset size is the length of each of the batchSize sequences.
local dsSize=20
-- number of classes, needs to be the same as outputSize above
-- or we get the dreaded "ClassNLLCriterion.lua:46: Assertion `cur_target >= 0 && cur_target < n_classes' failed. "
local nClass = 3
inputs = {}
targets = {}
-- Build up our inputs and targets
-- I2, add code so that if --cuda supplied, these become CudaTensors
-- using the opt.XXX and 'require cunn'
-- I3 - replace this random data set with something more meaningful / learnable
-- and with a realistic testing and validation set
for i = 1, dsSize do
table.insert(inputs, torch.randn(batchSize,inputSize))
-- populate both tables to get ready for training
table.insert(targets, torch.LongTensor(batchSize):random(1,nClass))
end
for key,value in pairs(targets) do print(value) end
-- Decorate the regular nn Criterion with a SequencerCriterion as this simplifies training quite a bit
seqC = nn.SequencerCriterion(nn.ClassNLLCriterion())
local count = 0
local numEpochs=100
local start = torch.tic()
--Now let's train our network on the small, fake dataset we generated earlier
while numEpochs ~= 0 do
rnn:training()
count = count + 1
out = rnn:forward(inputs) -- your are feeding batchSize sequences each of length dsSize steps
err = seqC:forward(out, targets)
gradOut = seqC:backward(out, targets)
rnn:backward(inputs, gradOut)
local currT = torch.toc(start)
print('loss', err .. ' in ', currT .. ' s')
--TODO, make this configurable / reduce over time as the model converges
rnn:updateParameters(0.05)
-- I5: Are these steps necessary? Seem to make no difference to convergence if called or not
-- Perhaps they are being called by
rnn:zeroGradParameters()
-- rnn:forget() dont need this as Sequencer handle it directly.
start = torch.tic()
-- I6: Make this configurable based on the convergence, so we keep going for bigger, more complex models until they are trained
-- to an acceptable accuracy
-- Also add in code to save out the model file to disk for evaluation / usage externally periodically
numEpochs = numEpochs - 1
end
I useds Humphrey's and Nick's above code with my EEG data to see what would happen. Here is the slightly modified version of code:
I'm trying to get it to breakout of training based on the evaluation data set. Compared to a MLP, the Validation error starts high, but I see that the Validation error in the beginning of the LSTM is very low and ramps up as the generalization error increases. I have included two graphs, one going up to epoch 300 and one that is stopped around 188 epoch due to vailidation. The moving mean calculation looks for a change in the average of the last 20 epochs compared to the threshold of the average of the last 120 epoc times 1.1.
I also removed one of the lstm layers, since I had large spikes every 50-60 epochs.
Do I have something wrong here? Why is the evaluation error so low in the beginning?
Any comments are appreciated! - Thanks.. John
print(" - Training Classifier") while epoch < maxEpochs do
rnn:remember(both)
rnn:training()
epoch = epoch + 1
if(opt.debug) then
print(' Epoch: ',epoch)
end
local start = torch.tic()
--rnn:dropout(0.5)
out = rnn:forward(inputTrn) -- feeding batchSize sequences each of
length dsSize steps errTrn = seqC:forward(out, targetTrn) gradOut = seqC:backward(out, targetTrn) rnn:backward(inputTrn, gradOut) local trnDuration = torch.toc(start) --TODO, make this configurable / reduce over time as the model converges
rnn:updateParameters(learningRate) -- BPTT occurs
-- I5: Are these steps necessary? Seem to make no difference to
convergence if called or not -- Perhaps they are being called by rnn:zeroGradParameters() --rnn:forget()
-- Do evaluation
rnn:evaluate()
out2 = rnn:forward(inputVal) -- feeding batchSize sequences each of
length dsSize steps errVal = seqC:forward(out2, targetVal)
if(opt.useValidation) then
avg120,mvavg120 = mv_avg(errVal,mvavg120)
avg20,mvavg20 = mv_avg(errVal,mvavg20)
if(epoch > minEpochs) then
if(avg20 > avg120*opt.delta20to120) then
print(" - Moving Avg Validation Break at Epoch:",
epoch,'',avg20,'',avg120*opt.delta20to120) break end end end
if(errTrn < 10 and errTrn > 8) then
print( epoch, 'TrnErr:', errTrn, ' ValErr:', errVal, '
TrnTime:', trnDuration ) end
-- I6: Make this configurable based on the convergence, so we keep
going for bigger, more complex models until they are trained -- to an acceptable accuracy -- Also add in code to save out the model file to disk for evaluation / usage externally periodically epoc_cnt[epoch] = epoch errorTrn[epoch] = errTrn errorVal[epoch] = errVal if(epoch % 10 == 0) then print(' - epoch:', epoch) gfile:write(logfile,',',epoch,',',errTrn,',',errVal,'\n') rfile:write(logfile,',',epoch,',',errTrn,',',errVal,'\n') end end print(' - Last Epoch:', epoch)
On Sat, Dec 26, 2015 at 7:49 PM, Humphrey Sheil notifications@github.com wrote:
Hi @nicholas-leonard https://github.com/nicholas-leonard
Thanks for the reply and code. Apologies for the late reply - too much Christmas turkey and pudding!
I ran your example but it failed with "invalid arguments: DoubleTensor number DoubleTensor LongTensor expected arguments: DoubleTensor~1D [DoubleTensor~1D] [double] DoubleTensor~2D DoubleTensor~1D | DoubleTensor~1D double [DoubleTensor~1D] double DoubleTensor~2D DoubleTensor~1D stack traceback:"
Looking at line 45, I think you meant to populate the targets table instead of the inputs twice? I changed it to targets but then I get a different error: "lua/5.1/nn/ClassNLLCriterion.lua:46: Assertion `cur_target >= 0 && cur_target < n_classes' failed. " Printing out the targets table, the ranges look fine but I remember getting this error before when there was a mismatch between my target classes and output cells so when I change line 33 so that nClass == outputSize == 3, it all works and trains nicely!
I document these two issues here for other readers as the biggest thing I find slowing me down are (probably obvious) errors when I haven't wired things together correctly.. I think a Validator class would be a big help to pass models + desired data into and it would pass judgement on whether they will work or not (and over time, helpful hints on how to get them to work)
For readers, the following network works (make sure you get latest torch and luarocks install all the latest rocks so you don't get weird behaviour with mismatches - torch changes fast..
I'll keep working on the tutorial and update progress here..
require 'rnn'
function build_network(inputSize, hiddenSize, outputSize) -- I1: add in a dropout layer rnn = nn.Sequential() :add(nn.Sequencer(nn.Linear(inputSize, hiddenSize))) :add(nn.Sequencer(nn.LSTM(hiddenSize, hiddenSize))) :add(nn.Sequencer(nn.LSTM(hiddenSize, hiddenSize))) :add(nn.Sequencer(nn.Linear(hiddenSize, outputSize))) :add(nn.Sequencer(nn.LogSoftMax())) -- I1: Adding this line makes the loss oscillate a lot more during training, when according to -- http://arxiv.org/abs/1409.2329 this should help model performance -- A1: initialization often depends on each dataset. --rnn:getParameters():uniform( -0.1, 0.1) return rnn end
-- Keep the input layer small so the model trains / converges quickly while training local inputSize = 10 -- Most models seem to use 512 LSTM units in the hidden layers, so let's stick with this local hiddenSize = 512 -- We want the network to classify the inputs using a one-hot representation of the outputs local outputSize = 3
local rnn = build_network(inputSize, hiddenSize, outputSize)
--artificially small batchSize again for easy training -- this can be the number of sequences to train on local batchSize=5 -- the dataset size is the length of each of the batchSize sequences. local dsSize=20 -- number of classes, needs to be the same as outputSize above -- or we get the dreaded "ClassNLLCriterion.lua:46: Assertion `cur_target >= 0 && cur_target < n_classes' failed. " local nClass = 3
inputs = {} targets = {}
-- Build up our inputs and targets -- I2, add code so that if --cuda supplied, these become CudaTensors -- using the opt.XXX and 'require cunn' -- I3 - replace this random data set with something more meaningful / learnable -- and with a realistic testing and validation set for i = 1, dsSize do table.insert(inputs, torch.randn(batchSize,inputSize)) -- populate both tables to get ready for training table.insert(targets, torch.LongTensor(batchSize):random(1,nClass)) end
for key,value in pairs(targets) do print(value) end
-- Decorate the regular nn Criterion with a SequencerCriterion as this simplifies training quite a bit seqC = nn.SequencerCriterion(nn.ClassNLLCriterion())
local count = 0 local numEpochs=100 local start = torch.tic()
--Now let's train our network on the small, fake dataset we generated earlier while numEpochs ~= 0 do rnn:training() count = count + 1 out = rnn:forward(inputs) -- your are feeding batchSize sequences each of length dsSize steps err = seqC:forward(out, targets) gradOut = seqC:backward(out, targets) rnn:backward(inputs, gradOut) local currT = torch.toc(start) print('loss', err .. ' in ', currT .. ' s') --TODO, make this configurable / reduce over time as the model converges rnn:updateParameters(0.05) -- I5: Are these steps necessary? Seem to make no difference to convergence if called or not -- Perhaps they are being called by rnn:zeroGradParameters() -- rnn:forget() dont need this as Sequencer handle it directly. start = torch.tic() -- I6: Make this configurable based on the convergence, so we keep going for bigger, more complex models until they are trained -- to an acceptable accuracy -- Also add in code to save out the model file to disk for evaluation / usage externally periodically numEpochs = numEpochs - 1 end
— Reply to this email directly or view it on GitHub https://github.com/Element-Research/rnn/issues/92#issuecomment-167373652 .
@jundeng86 yup : seqLength x batchSize x inputSize
. The first dimension indexes a table, the remainder, a tensor. Such that there are seqLength
tensors of size batchSize x inputSize
.
@hsheil You fixed my example : the second inputs
should have indeed been targets
, and outputSize = nClass
. Your tutorial is looking really good. I like that it doesn't have any dp.Experiment and such. It is easy to understand.
@jrich9999 Your code is copy pasted here to provide better syntax highlighting :
print(" - Training Classifier")
while epoch < maxEpochs do
rnn:remember(both)
rnn:training()
epoch = epoch + 1
if(opt.debug) then
print(' Epoch: ',epoch)
end
local start = torch.tic()
--rnn:dropout(0.5)
out = rnn:forward(inputTrn) -- feeding batchSize sequences each of
length dsSize steps
errTrn = seqC:forward(out, targetTrn)
gradOut = seqC:backward(out, targetTrn)
rnn:backward(inputTrn, gradOut)
local trnDuration = torch.toc(start)
--TODO, make this configurable / reduce over time as the model
converges
rnn:updateParameters(learningRate) -- BPTT occurs
-- I5: Are these steps necessary? Seem to make no difference to
convergence if called or not
-- Perhaps they are being called by
rnn:zeroGradParameters()
--rnn:forget()
-- Do evaluation
rnn:evaluate()
out2 = rnn:forward(inputVal) -- feeding batchSize sequences each of
length dsSize steps
errVal = seqC:forward(out2, targetVal)
if(opt.useValidation) then
avg120,mvavg120 = mv_avg(errVal,mvavg120)
avg20,mvavg20 = mv_avg(errVal,mvavg20)
if(epoch > minEpochs) then
if(avg20 > avg120*opt.delta20to120) then
print(" - Moving Avg Validation Break at Epoch:",
epoch,'',avg20,'',avg120*opt.delta20to120)
break
end
end
end
if(errTrn < 10 and errTrn > 8) then
print( epoch, 'TrnErr:', errTrn, ' ValErr:', errVal, '
TrnTime:', trnDuration )
end
-- I6: Make this configurable based on the convergence, so we keep
going for bigger, more complex models until they are trained
-- to an acceptable accuracy
-- Also add in code to save out the model file to disk for
evaluation / usage externally periodically
epoc_cnt[epoch] = epoch
errorTrn[epoch] = errTrn
errorVal[epoch] = errVal
if(epoch % 10 == 0) then
print(' - epoch:', epoch)
gfile:write(logfile,',',epoch,',',errTrn,',',errVal,'\n')
rfile:write(logfile,',',epoch,',',errTrn,',',errVal,'\n')
end
end
print(' - Last Epoch:', epoch)
Usually, for cross-validation (early-stopping), we train the model on the entire training set and then evaluate it on the entire validation set. Because the dataset of this example has only one batch, this is also what is happening here. However, I think we should modify the example so that the dataset has more than one batch per epoch. In this way, the training and validation loops can be added within the loop over epochs.
@hsheil https://github.com/hsheil @jundeng86 https://github.com/jundeng86 @nicholas-leonard I am attempting to use real data and it seems I'm confused how to get it in the correct Batch form for the Humphery/Nicks LSTM example. I think this is import for others to get how the data is feed into the example, I am rather thick headed, so bare with me, please... I'll walk through my assumptions below, please comment where I am off base... Thank You So Much!
So your comment: seqLength x batchSize x inputSize. The first dimension indexes a table, the remainder, a tensor. Such that there are seqLength tensors of size batchSize x inputSize.
For my case, the EEG data I have is :
if I want the LSTM to learn on 1 sensor "S" for all patients "numPatients" each with there own class label "CL"
I want to set up a Network with: local rnn = build_network(inputSize, hiddenSize, outputSize) -- inputSize=6, hiddenLayerSize=42, outputSize=2 (nClassLabels)
If I want my LSTM to use Batch Learning:
I am guessing that:
-- Setting Up the data in Torch for your LSTM example for "7 BATCHES of 6 SAMPLES EACH" dataLength = batchSize * inputSize for i = 1,numPatients do local inp_tmp = torch.DoubleTensor(batchSize, inputSize) local s = inp_tmp:storage() for j = 1, dataLength do s[j] = sdata2[trnIndex[i]][j] end table.insert(inputTrn, inp_tmp) -- Appends Table "numPatients" times with each Tensor "batchSize x inputSize"
local tar_tmp = torch.LongTensor(batchSize)
tar_tmp:fill(sclass[trnIndex[i]])
table.insert(targetTrn, tar_tmp) -- Appends Table "numPatients" with
batchSize (row) by 1(col) Tensor full of the class label "CL" end
On Tue, Dec 29, 2015 at 3:40 PM, Nicholas Léonard notifications@github.com wrote:
@jundeng86 https://github.com/jundeng86 yup : seqLength x batchSize x inputSize. The first dimension indexes a table, the remainder, a tensor. Such that there are seqLength tensors of size batchSize x inputSize.
@hsheil https://github.com/hsheil You fixed my example : the second inputs should have indeed been targets, and outputSize = nClass. Your tutorial is looking really good. I like that it doesn't have any dp.Experiment and such. It is easy to understand.
@jrich9999 https://github.com/jrich9999 Your code is copy pasted here to provide better syntax highlighting :
print(" - Training Classifier") while epoch < maxEpochs do
rnn:remember(both) rnn:training() epoch = epoch + 1 if(opt.debug) then print(' Epoch: ',epoch) end local start = torch.tic() --rnn:dropout(0.5) out = rnn:forward(inputTrn) -- feeding batchSize sequences each of
length dsSize steps errTrn = seqC:forward(out, targetTrn) gradOut = seqC:backward(out, targetTrn) rnn:backward(inputTrn, gradOut) local trnDuration = torch.toc(start) --TODO, make this configurable / reduce over time as the model converges
rnn:updateParameters(learningRate) -- BPTT occurs -- I5: Are these steps necessary? Seem to make no difference to
convergence if called or not -- Perhaps they are being called by rnn:zeroGradParameters() --rnn:forget()
-- Do evaluation rnn:evaluate() out2 = rnn:forward(inputVal) -- feeding batchSize sequences each of
length dsSize steps errVal = seqC:forward(out2, targetVal)
if(opt.useValidation) then avg120,mvavg120 = mv_avg(errVal,mvavg120) avg20,mvavg20 = mv_avg(errVal,mvavg20) if(epoch > minEpochs) then if(avg20 > avg120*opt.delta20to120) then print(" - Moving Avg Validation Break at Epoch:",
epoch,'',avg20,'',avg120*opt.delta20to120) break end end end
if(errTrn < 10 and errTrn > 8) then print( epoch, 'TrnErr:', errTrn, ' ValErr:', errVal, 'TrnTime:', trnDuration ) end -- I6: Make this configurable based on the convergence, so we keep
going for bigger, more complex models until they are trained -- to an acceptable accuracy -- Also add in code to save out the model file to disk for evaluation / usage externally periodically epoc_cnt[epoch] = epoch errorTrn[epoch] = errTrn errorVal[epoch] = errVal if(epoch % 10 == 0) then print(' - epoch:', epoch) gfile:write(logfile,',',epoch,',',errTrn,',',errVal,'\n') rfile:write(logfile,',',epoch,',',errTrn,',',errVal,'\n') end end print(' - Last Epoch:', epoch)
Usually, for cross-validation (early-stopping), we train the model on the entire training set and then evaluate it on the entire validation set. Because the dataset of this example has only one batch, this is also what is happening here. However, I think we should modify the example so that the dataset has more than one batch per epoch. In this way, the training and validation loops can be added within the loop over epochs.
— Reply to this email directly or view it on GitHub https://github.com/Element-Research/rnn/issues/92#issuecomment-167874051 .
@jrich9999 Nice concrete example with the EEG data.
How would I build the seqLength x batchSize x inputSize
batch?
Because your sample has 42 time-steps, your seqLength = 42
. So your input table will have 42 elements.
The batchSize
is arbitrary. Something like 8,16,32 should work well. Larger batchSize means more parallelization on GPU, but slower per-example convergence. There is a sweet spot. Don't look too hard for it.
The inputSize
is the dimensionality of the your input sensor data. So if you are using one input sensor, and that sensors outputs a vector of 6 dimensions at each time-step, then inputSize = 6
. On the other hand, if you have 19 EEG sensors, each outputing a scalar value at each time-step, then inputSize = 19
.
As for the hiddenLayerSize
this will determine how much modeling capacity you allocate to the network. So higher means that you can model more complex functions, but also means it it more prone to overfitting the training data. This is a hyper-parameter which you will need to play with. Trying values of 32, 64, 128, 256, ... , you should choose the hiddenLayerSize that gives the best performance on the validation set.
Your outputSize is good.
Okay, so you could organize your data as an input tensor of size seqLength x numPatients x inputSize
and an output tensor of size numPatients
. To get a batch, you can use input:narrow(2, n,batchSize), target:narrow(1,n,batchSize)
.
Thank you for a great example. One issue is bit confusing - how should the output tensor ideally look like? I changed the dataset to three sequences: [0.1, 0.2 ... 1.0] [1.0, 1.1 ... 2.0] [-1, -0.9 ... -0.1] and labeled them as [1,2,3]. Output tensor after 100 epochs looks something like: -0.0830 -2.8453 -3.8407 -3.7361 -0.0313 -4.9712 -3.2264 -3.5956 -0.0695 -0.0597 -3.2954 -3.8717 -3.3373 -0.0452 -4.7577 So it is near zero at the target label and negative otherwise. Does it look correct? Thank you.
@rracinskij : remember hte output is the log
of the 'real' output. If you take exp
of your values, eg of '-0.0830 -2.8453 -3.8407', you get:
0.92 0.06 0.02
... which I imagine is more in line with your expectations?
@hughperkins: Indeed it is :) Thanks a lot!
With respect to the above examples, I have a question.
Let's say I have a dataset of size seq_length X data_size X feature_size
, where data_size
is my number of training examples and data_size >> batch_size
.
For clarification, lets say,
seq_length = 10
data_size = 50,000
feature_size = 200
batch_size = 32
train_data = torch.randn(seq_length, data_size, feature_size)
train_target = torch.randn(seq_length, data_size, feature_size)
Now, is the following code correct for training the LSTM model on this training set ?
for i = 1, num_epochs do
rnn:training()
inputs = torch.Tensor(seq_length, batch_size, feature_size)
targets = torch.Tensor(seq_length, batch_size, feature_size)
for j = 1, seq_length do
for k= 1, data_size, batch_size do
inputs[{{ j }}]] = train_data[{{ j }, { k, k+batch_size - 1 }}]
targets[{{ j }}]] = train_target[{{ j }, { k, k+batch_size - 1 }}]
end
end
out = rnn:forward(inputs)
err = seqC:forward(out, targets)
print('loss is ', err)
gradOut = seqC:backward(out, targets)
rnn:backward(inputs, gradOut)
rnn:updateParameters(0.05)
rnn:zeroGradParameters()
end
If yes, then can someone point out that where exactly do we need to use rnn:backwardThroughTime()
and rnn:forget()
?
Thanks in advance.
@hsheil Depends what your rnn looks like. Can you print it here?
Hi @nicholas-leonard I think that request came from another guy in the thread, it wasn't me..
Hi @nicholas-leonard Quick update: I'm making reasonable progress on the tutorial + code. It's going to be a three-parter now:
Post 1 should be ready tomorrow evening some time. Would be great if you could review it for technical accuracy. I'm using the RecSys 2015 challenge data set - it will be interesting to compare LSTM vs Vowpal Wabbit performance (very early / un-tuned VW performance is documented here: http://humphreysheil.com/blog/a-quick-run-through-vowpal-wabbit).
Let me know what you think.
Hi @nicholas-leonard
Part one (code and commentary) is now ready for review:
https://github.com/hsheil/rnn-examples https://github.com/hsheil/rnn-examples/blob/master/example_part1.lua https://github.com/hsheil/rnn-examples/blob/master/part1.md
I've left the code here for now until it's signed off.
I'm happyish with the code now in that I think it is naive but correct. It is well-behaved in minimising loss even when I scale up the dsSize to 20_000 or so.
I still don't think I fully understand the seqLength so feel free to critique how I'm using seqLength+batchSize to chunk / index the full epoch - this will make more sense when the example moves to a real data set.
My intuition is that there is no point in me constructing a validation / test set using the torch.randn(batchSize,inputSize) trick as performance will be bad, so I'm putting that code into part two with the real data set.
All feedback appreciated.
@hsheil This looks awesome! I like the detailed analysis in part1.md. I submitted a PR with small fixes : https://github.com/hsheil/rnn-examples/pull/1 . Can't wait to see it evolve to the real dataset you want to use. Let me know if you need more help. Once your post is ready I will definitely link it prominently on our README.md. Also, make sure you update dpnn and rnn as a major bug was fixed.
@hsheil Nice!!! Exactly what I needed to move forward. I've been away for a bit, delayed holidays, work, etc. I haven't had much time to resolve getting my validation/test set to work with my data set. But I'm back now, Thanks, this really is good stuff!
@hsheil Thank you for a helpful example. Could you please explain why you create batchInputs and batchTargets of 8 (batchSize+seqLength-2)?
@nicholas-leonard Cool will do. Thanks for the PR, digesting it now :) CUDA code is done now and am seeing a 10x speed-up over CPU on the fake dataset which is a nice illustration. If you like, the posts and code can all go into this repo as a tutorial of sorts - I set up rnn-examples so I could push and pull easily between my dev and CUDA machine while I'm coding.
@rracinskij the batchSize and seqLength are currently set in proportion to dsSize - this code will be tightened up in the next iteration to not require that precondition (and to add in the separate validation and testing sets). That for loop condition ensures that we present all of dsSize to the network for training - you can verify that by adding in a print() on line 99 to see that each loop gets the right chunk of dsSize for the range [offset, offset+i].
@hsheil No need to include in rnn. I like that this is its own separate repository that I don't need to maintain! I hope you still intend to add your real-world e-commerce dataset to the example. That would be awesome.
@nicholas-leonard Hi, working on it. I was just tuning the parallel Vowpal Wabbit impl as I need to compare LSTM to a good baseline (VW) as part of my research path. ETA on the next instalment is Sunday night UK time :)
Going through @hsheil 's example, I have a few questions.
dsSize
examples, each made up of seqLength
number of events, and each event is of length inputSize
(i.e. my feature length). Now while preparing the toy dataset my inputs
size is dsSize X batchSize X inputSize
(here). Shouldn't it be dsSize X seqLength X inputSize
?batchSize + seqLength
at each step ? From what I understand, inputs[i]
will give me the i
th batchSize X inputSize
data (since in the example my inputs
is dsSize X batchSize X inputSize
). Then why batchSize + seqLength
stepwidth ?inputs[offset+i]
(here) where offset
is initialized to 1 and i
is initialized to 2, it is always starting from inputs[3]
. Thus inputs[1]
and inputs[2]
never gets selected. Is that right ?batchInputs
and batchTargets
of size 8 (batchSize
+ seqLength
- 2) ?Overall, I am still finding it a bit difficult to grasp how the input should be presented for batchwise training in a GPU. Maybe I am having some flaw in my basic understanding only.
My understanding is, while batch training the input should be presented like, TimeStep X BatchLength X FeatureLength
, i.e. my first batch should contain my first timestep data. Going through that my LSTM
will update it's states
and then in the second batch my second timestep data will be presented. Likewise after my LSTM
has seen all the TimeStep
data for BatchSize
, it will output me BatchSize
number of outputs
(for simplicity, let's consider it's a many-to-one LSTM, i.e. each sequence has only one associated label
).
But it is hard to verify it from the example proposed above.
Any help regarding what the input format should be like, will be highly appreciated.
Thanks in advance.
Hi @nicholas-leonard and the other folks who posted code on this issue: part two is ready (a real-world dataset) - it addresses a faux-pas in part one whereby I was session-oriented and not sequence step-oriented in creating batches for the network. All feedback appreciated!
Write-up: https://github.com/hsheil/rnn-examples/blob/master/lstm-2.md
Code: https://github.com/hsheil/rnn-examples/tree/master/part2
Hi Humphrey @hsheil ,
Great work! Really looks good. Love what you are researching. I am guessing that I may not have the latest Torch files to run your latest Example 2. Since I am still learning about Torch, is there an easy way to update torch? I run the xxx/torch/update.sh and git pulls from the master, but I'm not sure that is all I need to do. Did you pull in specific Torch fixes that are outside of master? Do I need to run a recompile of anything since I will attempt to use cuda? Sorry for some of these basic questions...
Whew, example 2 is using a lot of data.. :)
Thanks, John
/home/john-1404-64/torch/install/bin/luajit: .../john-1404-64/torch/install/share/lua/5.1/nn/Sigmoid.lua:4: attempt to call field 'Sigmoid_updateOutput' (a nil value) stack traceback: .../john-1404-64/torch/install/share/lua/5.1/nn/Sigmoid.lua:4: in function 'updateOutput' ...hn-1404-64/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'updateOutput' ...n-1404-64/torch/install/share/lua/5.1/nn/ConcatTable.lua:11: in function 'updateOutput' ...hn-1404-64/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'updateOutput' ...n-1404-64/torch/install/share/lua/5.1/nn/ConcatTable.lua:11: in function 'updateOutput' ...hn-1404-64/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'updateOutput' ...n-1404-64/torch/install/share/lua/5.1/nn/ConcatTable.lua:11: in function 'updateOutput' ...hn-1404-64/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'updateOutput' /home/john-1404-64/torch/install/share/lua/5.1/rnn/LSTM.lua:162: in function 'updateOutput' ...hn-1404-64/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'updateOutput' ...ohn-1404-64/torch/install/share/lua/5.1/rnn/Recursor.lua:24: in function 'updateOutput' ...hn-1404-64/torch/install/share/lua/5.1/rnn/Sequencer.lua:47: in function 'forward' main.lua:155: in main chunk [C]: in function 'dofile' ...4-64/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk [C]: at 0x00406670 john-1404-64@int-1404-64:/home/my/1jr/Prog/WrkSpc_torch/hsheil/rnn-lstm_example2/rnn-examples/part2$
@hsheil Not sure I understand what the inputs and outputs are for this section : https://github.com/hsheil/rnn-examples/blob/master/lstm-2.md#feature-design . Maybe you could clarify where you get 194 from that table.
Hi @nicholas-leonard sure thing. The 194 is made up of 12 (months)+ 31 (days)+ 7 (day of week) + 24 (hour of day) + 60 (minute of hour) + 60 (second of minute) = 194, all encoded in OneHot format. A couple of comments on this:
Hope this all makes sense. I'm going to push evaluator.lua soon which will give a like for like comparison between LSTM and Vowpal Wabbit which will be interesting.
Let me know your thoughts on the first point!
@hsheil Sounds good. Could you add that breakdown of the 194 to the docs for clarity? Thanks.
Also, would love to see more images of stuff in tutorial : learning curves, etc. Maybe some concrete examples as what gets predicted, e.g. given A, B, and C, users are most likely to purchase/click on D where A,B,C and D are interesting and human-relatable. Of course, you will need to build your evaluate script first.
Hi @nicholas-leonard ok, will do. The code has evolved a lot since the original approach - implementing MaskZero was a really important step in improving the training phase and I need to update the docs to reflect this. I've been heads-down on another project but plan to work on this code and docs again this weekend so will ping you when it's ready for another look.
@nicholas-leonard PS the evaluate script is done and pushed (evaluator.lua in the part2 sub-directory) - it exposed some problems in the original impl that I've been fixing, hence the flurry of related commits.
@hsheil Yeah its looking good. Can't wait to see the final tutorial and code.
@nicholas-leonard Apologies for the delay in this Nicholas. I've been working on the code on and off and my supervisor also wants me to build a side-by-side impl using TensorFlow :)
The last push I did (https://github.com/hsheil/rnn-examples/commit/cb30a484d4b5577c346b9908811452beaa2bfd97) has a lot of improvements, resulting in an F1 score of 0.990 when tuned using Spearmint (and on the validation set to boot). It turns out that calling model:remember('both') resulted in a very significant perf improvement. I think there's a glitch in the docs on this that I'll submit a small PR for.
Next I'm going to revamp the part2 MD file to reflect all of the changes and then move onto documenting part3 (effect of using Spearmint to tune hyper params - already coded just not documented) - probably on the flight to GTC!
Once that's done, I'd actually like to plug dp back in and see what benefit it brings - if the reader comes on the journey through parts 1, 2 and 3, then dp should be pretty accessible at the end of part 3.
H
@hsheil Let me know what you think about TensorFlow w.r.t. Torch. Good to see the project is advancing. I know these things can take time :) I just merged your PR. Was a valid point. I will see you at GTC. Also, I recommend not using dp as I am trying to move away from it, focusing instead on rnn, dpnn, dataload. Can't wait to see the final readme with all the parts listed.
@nicholas-leonard Ok, good to know RE: dp, it was dp.Experiment that looked most interesting to try and leverage / use. Look forward to catching up!
@hsheil I stumbled upon this thread while looking for help on a similar problem. However, I see that your writeup repository is no longer visible. Any chance you'd make it available, or point to a new location?
Hi @beldaz - I moved over to PyTorch quite a while ago so I stopped working on that repo - the code I wrote is very specific to the dataset I was using while I think Torch just needed (at the time) some simpler LSTM examples to go with the docs but I think that was added in the examples repo - @nicholas-leonard et al did quite a bit of work on that. I thought about closing this issue but figured the thread might be useful for some.
Hi
I'm trying to document the simplest LSTM example possible. By "simplest", I mean the intersection of least lines of code combined with minimal use of libraries that hide away details. Once this simple case is working well, then the plan is to add in libraries like dp to show what that library provides, i.e. dp.Experiment. And to do this in a progressive way so that the reader can roll forwards / backwards between versions of the code to build understanding. So:
The code in the gist below works (loss reduces as the model trains), but with some caveats which I've documented as inline comments, numbered I0 through I6. I4 is the biggest issue i'd like to resolve..
I'd appreciate any feedback on the specific items in the code (esp. I4) and also any other comments on the code! When this code is good enough, the plan is to publish it as a tutorial of sorts on the rnn package with worked examples that progressively become more complex / advanced - feeding into other issue requests in this repo for more examples. Thanks in advance!
Here's the gist: https://gist.github.com/hsheil/54c0e83d4666db5081df