JonathanRaiman / theano_lstm

:microscope: Nano size Theano LSTM module
Other
304 stars 111 forks source link

Simple example #8

Closed allchemist closed 9 years ago

allchemist commented 9 years ago

Hello!

Can you please provide a simple example of usage?

Thanks!

siemanko commented 9 years ago

Yes

On Tue, Mar 10, 2015, 9:01 AM allchemist notifications@github.com wrote:

Hello!

Can you please provide a simple example of usage?

Thanks!

— Reply to this email directly or view it on GitHub https://github.com/JonathanRaiman/theano_lstm/issues/8.

mheilman commented 9 years ago

Here's a very, very simple example that might help. It doesn't do anything useful, but it expands on the examples in the README to make something that will run.

#!/usr/bin/env python3

import numpy as np
import theano
import theano.tensor as T

from theano_lstm import (create_optimization_updates, Layer, LSTM, StackedCells)

def main():
    # Make a dataset where the network should learn whether the number 1 has been seen yet in the first column of
    # the input sequence.  This probably isn't really a good example use case for an LSTM, but it's simple.
    rng = np.random.RandomState(123456789)
    input_size = 1
    input_length = 10
    sample_size = 500
    num_iterations = 3
    examples = rng.choice([-2, -1, 0, 1, 2], (sample_size, input_length)).astype(theano.config.floatX)
    labels = np.array([[1 if np.sum(np.abs(x[:y + 1])) > 5 else 0 for y in range(len(x))]
                       for x in examples],
                      dtype=theano.config.floatX)

    hidden_layer_size = 10
    num_hidden_layers = 2

    model = StackedCells(input_size,
                         layers=[hidden_layer_size] * num_hidden_layers,
                         activation=T.tanh,
                         celltype=LSTM)

    # Make the connections from the input to the first layer have linear activations.
    model.layers[0].in_gate2.activation = lambda x: x

    # Add an output layer to predict the labels for each time step.
    output_layer = Layer(hidden_layer_size, 1, T.nnet.sigmoid)
    model.layers.append(output_layer)

    def step(x, *prev_hiddens):
        activations = model.forward(x, prev_hiddens=prev_hiddens)
        return activations

    input_vec = T.vector('input_vec')
    input_mat = input_vec.dimshuffle((0, 'x'))

    result, _ = theano.scan(fn=step,
                            sequences=[input_mat],
                            outputs_info=([dict(initial=hidden_layer.initial_hidden_state, taps=[-1])
                                           for hidden_layer in model.layers[:-1]] +
                                          [dict(initial=T.zeros_like(model.layers[-1].bias_matrix), taps=[-1])]))

    target = T.vector('target')
    prediction = result[-1].T[0]

    cost = T.nnet.binary_crossentropy(prediction, target).mean()

    updates, _, _, _, _ = create_optimization_updates(cost, model.params)

    update_func = theano.function([input_vec, target], cost, updates=updates, allow_input_downcast=True)
    predict_func = theano.function([input_vec], prediction, allow_input_downcast=True)

    for cur_iter in range(num_iterations):
        for i, (example, label) in enumerate(zip(examples, labels)):
            c = update_func(example, label)
            if i % 100 == 0:
                print(".", end="")
        print()

    test_cases = [np.array([-1, 1, 0, 1, -2, 0, 1, 0, 2, 0], dtype=theano.config.floatX),
                  np.array([2, 2, 2, 0, 0, 0], dtype=theano.config.floatX),
                  np.array([-2, -2, -2, 0, 0, 0], dtype=theano.config.floatX),
                  np.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 2, 0], dtype=theano.config.floatX),
                  np.array([2, 0, 0, 0, 2, 0, 0, 0, 0, -2, 0, 0, 0, 0, 0], dtype=theano.config.floatX),
                  np.array([2, 2, 2, 0, 0, 0, 2, 2, 2, 0], dtype=theano.config.floatX)]

    for example in test_cases:
        print("input", "output", sep="\t")
        for x, pred in zip(example, predict_func(example)):
            print(x, "{:.3f}".format(pred), sep="\t")
        print()

if __name__ == "__main__":
    main()
allchemist commented 9 years ago

Big thanks!

JonathanRaiman commented 9 years ago

Thanks !!!

allchemist commented 9 years ago

Sorry to bother, please help understand training data preparation.

As i understand, LSTM block recieves one value and returnes one value. So to make it learn a sequence we should prepare known sequence 'seq' as pairs [seq[1] , seq[2]], [seq[2] , seq[3]], [seq[3] , seq[4]] etc (first elem is train input, second - train target)

I tried different train data, but didnt manage to achieve the recurrence effect, so 'predict_func' returns same result on several repeated calls on same input. Looks like i failed in data preparation. Or it just resets its state after each 'predict_func' call?

If it isnt much trouble, can you show an example with sequence forecasting? Currently i use Lstm in PyBrain, but very interested in theano implementation.

stephenjia commented 9 years ago

@mheilman Thanks for your code. But can I ask you how extend it to minibatch case? I tried to modified your code for minibatch case, but I suffered problems with the input . Can you give me some help? Thanks. ... input_mat = T.matrix('input_mat') input_tensor = input_mat.dimshuffle((1,'x',0)) ...

mheilman commented 9 years ago

Sorry @stephenjia, but I don't have a good enough understanding of the best way to do that at the moment. One idea is to make another scan operation that iterates over the examples in a minibatch and makes updates for each (using the existing scan op). I'm not sure that's a good approach, though.

stephenjia commented 9 years ago

@mheilman Thanks all the same. I also tried the method in README file but I still got mistakes. I will try to find where I am wrong

JonathanRaiman commented 9 years ago

@stephenjia @mheilman To get a minibatch case to work with LSTMs in a sequence forecasting setting here is one way to get this to work:

Suppose you have n = 30 symbols to predict:

 n = 30

Build some layers out of LSTMs:

from theano_lstm import LSTM, StackedCells, Layer, masked_loss
hidden = 20
model = StackedCells(n, layers=[hidden, hidden], activation=T.tanh, celltype=LSTM)

Add a classifier: model.layers.append(Layer(hidden, n, lambda x: T.nnet.softmax(x)[0]))

Construct the recurrent prediction:

def step(x, *prev_hiddens):
      new_states = model.forward(x, prev_hiddens[:-1], 0.0)
      return new_states

Store your mini batch as one big matrix with one row per example (if all examples don't have the same length then pad them with 0s or something else):

 observations = T.matrix()

 result, updates = theano.scan(step,
                             observations[:,:-1],
                             outputs_info=[
     dict(initial=layer.initial_hidden_state, taps=[-1])
     for layer in model.layers if hasattr(layer, 'initial_hidden_state')
                             ] + [None] )

Now the error is as follows, we apply KL divergence at each timestep between a prediction and the next tilmestep observation (replace targets with what you want, but it must have the same dimensions as result[-1] (e.g. the softmaxes from the LSTMs stacks above). Also create an array of lengths of observations (if all lengths are not equal then this says: "sequence one is 1 long, sequence 2 is 2 long, sequence 3 is 5 long, etc..."): np.array([1, 2, 5, 2, 1, 3])

and tell the error where the sequences start:

 observation_starts = np.zeros(5, dtype=np.int32)

Tell the system what sequence needs to be forecasted (here we're just forecasting ourselves 1 step ahead):

targets = observations[:,1:]
observation_lengths = np.array([1, 2, 5, 2, 1, 3]) # etc... 
observation_starts = np.zeros(5, dtype=np.int32)
 error = masked_loss(result[-1], 
             targets,
             observation_lengths,
             observation_starts)

 error = error.sum()

Compute gradient descent:

updates, _, _, _, _ = create_optimization_updates(cost, model.params)

update_func = theano.function(
      [input_vec, target],
      cost,
      updates=updates,
      allow_input_downcast=True)
stephenjia commented 9 years ago

@JonathanRaiman Thanks for your detailed reply. I modified @mheilman example based on your suggestion, now I only play with the forward propagation part. However, there is a problem even before scan operation, says 'IndexError: tuple index out of range'.

Here's a very, very simple example that might help. It doesn't do anything useful, but it expands on the #examples in the README to make something that will run.

from future import print_function

import numpy as np import theano import theano.tensor as T

from theano_lstm import (create_optimization_updates, Layer, LSTM, StackedCells, masked_loss)

import random def get_minibatches_idx(n, minibatch_size, shuffle=False): """ Used to shuffle the dataset at each iteration. """ idx_list = np.arange(n, dtype="int32")

if shuffle:
    random.shuffle(idx_list)

minibatches = []
minibatch_start = 0
for i in range(n // minibatch_size):
    minibatches.append(idx_list[minibatch_start:
                                minibatch_start + minibatch_size])
    minibatch_start += minibatch_size

if (minibatch_start != n):
    # Make a minibatch out of what is left
    minibatches.append(idx_list[minibatch_start:])

return zip(range(len(minibatches)), minibatches)

def main():

Make a dataset where the network should learn whether the number 1 has been seen yet in the first column of

# the input sequence.  This probably isn't really a good example use case for an LSTM, but it's simple.
import pdb; pdb.set_trace()
rng = np.random.RandomState(123456789)
input_size = 1
input_length = 10
sample_size = 500
num_iterations = 3
examples = rng.choice([-2, -1, 0, 1, 2], (sample_size, input_length)).astype(theano.config.floatX)
labels = np.array([[1 if np.sum(np.abs(x[:y + 1])) > 5 else 0 for y in range(len(x))]
                   for x in examples],
                  dtype=theano.config.floatX)
hidden_layer_size = 10
num_hidden_layers = 2

model = StackedCells(input_size,
                     layers=[hidden_layer_size,hidden_layer_size],
                     activation=T.tanh,
                     celltype=LSTM)

# Add an output layer to predict the labels for each time step.
model.layers.append(Layer(hidden_layer_size, input_length, lambda x: T.nnet.sigmoid(x)[0]))

def step(x, *prev_hiddens):
    activations = model.forward(x, prev_hiddens[:-1])
    return activations

initial_obs = T.matrix('')
#timesteps = T.iscalar('timesteps')

result, _ = theano.scan(step,
                          initial_obs[:,:-1],
                          outputs_info=[dict(initial=layer.initial_hidden_state, taps=[-1]) for layer in model.layers if hasattr(layer, 'initial_hidden_state')] + [None]) 

prediction = result[-1]

predict_func = theano.function(initial_obs, prediction, allow_input_downcast=True)

# get minibatches    
batches_idx = get_minibatches_idx(examples.shape[0], 5, shuffle=False)

for cur_iter in range(num_iterations):
    for _, batch_idx in batches_idx:
        batch_example = examples[batch_idx,:]
        batch_label =labels[batch_idx,:]
        output_all = predict_func(batch_example)             

if name == "main": main()

JonathanRaiman commented 9 years ago

@stephenjia Sounds like I made a typo somewhere. I'll have a look this weekend and send you a revised version

stephenjia commented 9 years ago

@JonathanRaiman Thanks a lot.

stephenjia commented 9 years ago

@JonathanRaiman, @mheilman I know what my problem is. I should give the initial state a variable with same shape as each timestep's input, that is, I should give the initial state a variable by repeating layer.initial_hidden_state n_sample times (ndim of the initial state should be 2 instead of 1).

JonathanRaiman commented 9 years ago

@mheilman @stephenjia Here's a better example for sequence forecasting that runs (no typos this time) with some comments on what everything does.

mheilman commented 9 years ago

nice!

stephenjia commented 9 years ago

@JonathanRaiman @mheilman thx!