added license, made initialization easier to modify and replicate

mheilman commented 9 years ago

The main (fairly simple) change here is that I refactored the weight initialization a bit so it is easier to modify in other code that uses this. To change the initialization, one can redefine the function theano_lstm. random_initialization to do uniform initialization or other things.

One can also modify the random seed by changing theano_lstm.np_rng.

mheilman commented 9 years ago

With the license, this addresses #7.

JonathanRaiman commented 9 years ago

Michael, Absolutely; Those are great changes. Unfortunately 'initial_hidden_state' is used / convenient when creating the 'outputs_info' for Theano's scan (see tutorial). I don't object to making the initial hidden state not a default parameter, but then the layers should have a method that returns some 'T.zeros(hidden_size)' value when an initial hidden state is needed. Perhaps instead of removing it entirely you could make it an @property of the layer so we get the best of both worlds ?

mheilman commented 9 years ago

Hi, I don't feel like I have a great handle on different possible use cases, but I think the current code should work OK for most cases.

The changes here just remove initial_hidden_state from what params returns. This means that the initial state will be removed from a) the optimization updates, and b) what a user would save as parameters for the model. It doesn't seem like the initial hidden state would be a parameter one would want to optimize or save for most uses of RNNs/LSTMs.

I didn't remove the member variable entirely, so using it for outputs_info should still be fine.

In my models, I've just been initializing outputs_info to zeros, as you suggested, rather than using the randomly initialized initial_hidden_state variable. Would it be better to do that by default? As you suggest, initial_hidden_state could just be replaced by a property that returns a vector or matrix of zeros. I can make that change, but I just wanted to make sure I'm understanding this. Apologies if I'm missing something.

JonathanRaiman commented 9 years ago

Great -- if it's still a member variable then we're covered.

Sometimes training an initial hidden state can lead to interesting behaviour (e.g. if you're predicting things, what might be captured in the "bias" vector of your logistic regression can end up living in this pre-state), I like to think of it as training the steady state of the LSTM. But you're right it's definitely not what seems to be reported in most of the literature, so we're better off with the change you made, and defaulting to having this be an opt in, instead of opt out.

JonathanRaiman / theano_lstm

added license, made initialization easier to modify and replicate #10