Closed Kaixhin closed 8 years ago
Might be better to try to support https://github.com/Element-Research/rnn/blob/master/FastLSTM.lua instead, since it has batched GEMMs. This LSTM implementation has batched GEMMs as well, but uses nngraph: https://github.com/karpathy/char-rnn/blob/master/model/LSTM.lua.
You'd need a way to apply :init()
to a submatrix. In the rnn.FastLSTM
case, it'd be something like for i
from 0 to 3, applying the initializations to the square (assuming LSTM input size = state/output size) submatrix given by i2g.weight:narrow(1, 1+i*outputSize, outputSize)
. Note the output dimension for weight matrices in nn.Linear
is the 1st dim.
@Kaixhin So I guess we could just pluck them out form the respective sub-components of the LSTM or FastLSTM modules. For the first part, we could use the internal references to self.inputGate
(https://github.com/Element-Research/rnn/blob/master/LSTM.lua#L53) and such.
@bshillingford is right about how to do this for FastLSTM.
If you make this extension of the rnn package, I will make nninit a default dependency of the rnn package.
@nicholas-leonard I realised that the internal references were sufficient for nn.LSTM
, but suspected something more powerful was needed in general. As @bshillingford pointed out, that something is indexing. I've introduced this for the basic functions in a new branch with 24cf519, along with a nn.FastLSTM
example. Since this is general, I don't see any need to introduce nninit
as a dependency for rnn
, but if you want to introduce it as a full or optional dependency that's fine too.
The main issue with nninit
as a dependency is that the API is still in flux, so it would be good to settle on something (even if breaking changes are introduced along with incrementing the luarocks version number). Points to consider and give feedback on:
wInit
and bInit
alongside initialisation schemes passed as strings means that the nn.Module
namespace is not polluted.addConstant
and mulConstant
make sense, or should they be integrated as modifiers to the constant
function?gain
needs to be changed to accept one argument - the current idea to support lrelu
would be to use a table that contains the string 'lrelu'
and the leakiness
parameter.@Kaixhin Maybe there's a way to more strongly decouple the modified tensors from the nn Modules.
Re: passing initialisation schemes as strings, what do you think about putting the schemes into a namespace then accepting them directly as arguments? E.g. might look like
local nninit = require 'nninit'
local m = nn.Linear(5,5):wInit(nninit.orthogonal)
where that's either the function itself or a wrapper object. Subjectively cleaner.
You've done some decoupling already but as a private API. Publicising the private API (maybe in a sub-namespace?) for initialisation primitives might look something like:
local nninit = require 'nninit'
local lstm_input = nn.Linear(input_size, rnn_size*4) -- input weights
for _, wt in ipairs(lstm_input.weight:split(1, rnn_size)) do
nninit.inits.xavier(wt)
end
local lstm_recur = nn.Linear(rnn_size, rnn_size*4) -- recurrent weights
for _, wt in ipairs(lstm_recur.weight:split(1, rnn_size)) do
nninit.inits.orthogonal(wt)
end
Less concise, but more flexible... there might be a way to find a good middle ground here. Perhaps with a function as the arg you can generalise more:
local lstm_recur = nn.Linear(rnn_size, rnn_size*4)
:init(function(m) return m.weight:narrow(1, 1, rnn_size) end, ...)
:init(function(m) return m.weight:narrow(1, rnn_size, rnn_size) end, ...)
:init(function(m) return m.weight:narrow(1, 2*rnn_size, rnn_size) end, ...)...
where the first arg to init
is something to initialise (let's call it target
). If target
is a string like bias
or weight
, you'd operate on self[target]
, and if target
is a function, the function takes self
and returns the tensor to modify. I guess that could be shortened by letting target
return a multiple things:
local lstm_recur = nn.Linear(rnn_size, rnn_size*4)
:init(function(m) return unpack(m.weight:split(1, rnn_size)) end, ...)
Having a single init
might clean up the code a bit (in the sense of duplication and coupling) while improving flexibility. This might be making things too general for a small number of use cases, since almost all nn modules only have weight
and bias
. Also not sure how you'd address addConstant
and mulConstant
in this case.
Thoughts on these random ideas?
Would like to have init
instead of wInit
and bInit
, and the first argument could be w
or b
. Second argument can be the namespaced function, e.g. nninit.orthogonal()
. All optional arguments within this should be passed through a table, such that (in the case of xavier
) dist
does not have to be specified in order to specify gain
.
Functions for the target are powerful but lengthy - is the indexing operator not general enough? In this case would have the indices (either in table or binary mask form) as the third and final argument to init
. Concretely:
lstm.i2g
:init('w', nninit.kaiming({dist = 'uniform', gain = {gain = 'lrelu', leakiness = 0.3}}))
:init('b', nninit.constant(1), {{3*hiddenSize-1, 3*hiddenSize}})
eye
is an example of where exposing initialisation primitives would fail, as it depends on the module type. Another potential example would be initialising nn.SpatialConvolution
with filters used in image processing (such as a Gaussian filter) - these initialisations would not make sense for linear layers (and many already do not make sense for biases). A further argument for this would be that the concept of abstraction in Torch is the module/layer, where a conscious decision has been made to tie weights and biases to modules rather than have them act as standalone tensors.
Good points. Indexing should be general enough in most cases, but I'm thinking of other obscure cases: suppose I write a layer for learned biases of some arbitrary shape, and it's parameters don't have the name 'weight' or 'bias', or lie inside a table inside the module. It would be useful to expose primitives if you want to do things like initialize conv filters to multiple scales of Gaussian filters too, but again this is an uncommon case and I suppose indexing solves this anyway (I think?).
I agree re Torch's philosophy, but I see your library as a way to address Torch's philosophy's shortcomings. That all said, I agree with your comments, thanks for sharing the library.
Sent from mobile On Dec 1, 2015 11:57 PM, "Kai Arulkumaran" notifications@github.com wrote:
Would like to have init instead of wInit and bInit, and the first argument could be w or b. Second argument can be the namespaced function, e.g. nninit.orthogonal(). All optional arguments within this should be passed through a table, such that (in the case of xavier) dist does not have to be specified in order to specify gain.
Functions for the target are powerful but lengthy - is the indexing operator https://github.com/torch/torch7/blob/master/doc/tensor.md#tensor--dim1dim2--or--dim1sdim1e-dim2sdim2e- not general enough? In this case would have the indices (either in table or binary mask form) as the third and final argument to init. Concretely:
lstm.i2g :init('w', nninit.kaiming({dist = 'uniform', gain = {gain = 'lrelu', leakiness = 0.3}})) :init('b', nninit.constant(1), {{3_hiddenSize-1, 3_hiddenSize}})
eye is an example of where exposing initialisation primitives would fail, as it depends on the module type. Another potential example would be initialising nn.SpatialConvolution with filters used in image processing (such as a Gaussian filter) - these initialisations would not make sense for linear layers (and many already do not make sense for biases). A further argument for this would be that the concept of abstraction in Torch is the module/layer, where a conscious decision has been made to tie weights and biases to modules rather than have them act as standalone tensors.
— Reply to this email directly or view it on GitHub https://github.com/Kaixhin/nninit/issues/3#issuecomment-161135933.
I would like to be convinced that this could let weights be treated more generally and hence overcome a shortcoming with Torch, but this seems tricky. For example, nninit.inits.xavier(wt)
depends on the fan in and fan out of the module - it could be worked out using the dimensions, but then eye
will fail on nn.Linear
vs. nn.TemporalConvolution
.
I do see the case for more arbitrary parameters and am happy to go ahead with target
being a string or function. I think that in the more general case if multiple tensors need to be returned then they should be placed in a table (rather than being unpacked), but in what situation would you provide a sequence of tensors when they would all be initialised by the same method?
One trivial way to do more complicated initialisations is to introduce a tensor
function that just calls copy
on a supplied tensor that matches the target size. Rather than operating directly on the weights (for example), using this within :init()
would allow the convenience of chaining.
OK new API almost ready on the indexing branch. The accessor
argument is either a string, a string and set of indices, or a function. The "primitives" are somewhat exposed, and could potentially be used with nil
as the first argument (a bit messy, but non-trivial initialisations in this case seem tied to the layer type/properties). A copy
method has been added for custom initialisations.
eye
and orthogonal
need fixing as they are the most tied to the module. The example will also need revising once those two methods are fixed.
Example now updated and simply having eye
and orthogonal
only process module weights. Will merge into master tomorrow unless there is any further API scoping to be done.
Although this library works fine with
nngraph
, it would be good to also supportrnn
- specifically the LSTM module. Given the new API introduced with https://github.com/Kaixhin/nninit/issues/2, how can the elements of the cell be initalised individually? Any feedback @nicholas-leonard?A notable reason to support this would be to implement the large forget gate bias introduced in:
The idea of
nninit
is to allow experimentation with initialisations/free maintainers from implementing "best practices".