AtheMathmo / rusty-machine

Machine Learning library for Rust
https://crates.io/crates/rusty-machine/
MIT License
1.25k stars 152 forks source link

Recurrent neural networks #185

Open tzaeru opened 6 years ago

tzaeru commented 6 years ago

Heya,

As I understand, RNNs are not yet in the project. If there's no current work on them, I'd like to look a bit into implementing them with rusty-machine. I'm still fairly new to both machine learning and Rust - done with my FNNs and linear regressions thus far - but have a specific project in mind which I'd like to try RNNs out with.

A start would be to set up an architehture for creating a RNN topology and forward-propagating through it. Then start working on the BPTT and later LSTM, which, if I've understood correctly, seems to be central in making RNNs work in practice, at least when working with data that spans more than a handful of timesteps.

For the implementation sources, I currently have this paper: https://arxiv.org/pdf/1610.02583.pdf and then this with some actual example code in Python: http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/

If there already is code in for this, then that's all for the better of course!

NivenT commented 6 years ago

I've worked on the current neural network code, so I think I have some comments that could be helpful for you working with this codebase (Disclaimer: I haven't read through the paper/Python code you link so they might lead to solutions to some potential issues I mention. Also, I'm thinking through this as I type so it's likely to contain errors; please do be skeptical of any issues/requirements I claim).

Unfortunately, I suspect there will be some annoyances with implementing RNNs (although it shouldn't be impossible) arising from the fact that RNNs have an internal state that gets updated as they process data. Generally speaking, the SupModel trait requires models to not be mutated when predicting, so maybe the most obvious way to add RNNs won't fly, but this isn't a death sentence; (I believe) you could still implement RNNs with a little creativity.

More specifically about working with the neural network code, I don't know if you've taken a look, but it's setup around the idea of stacking together multiple "NetLayers", so to implement RNNs, you just need to write a recurrent NetLayer. One important thing to know is that individual layers don't actually store their own parameters; all parameters for an entire network are stored in one contiguous array, and each layer receives a slice of that array to compute its function (You need parameters all in one place like this to optimize by gradient descent). So the million dollar question is how to represent the state of a recurrent layer. You can't add the state to the entire Network's weights since this would then require you updating the weights whenever you call predict as you step through the data (IMO this makes sense since the state of the layer isn't something learned by the network), and you also can't store the state directly your recurrent layer struct since you would then have to update the layer when you call predict and this also violates immutability (IMO this makes sense too since the state of the layer isn't one of its defining properties), but you do have options.

I could be wrong, but I think you will end up having to (minimally) modify the existing neural network code to allow for a concept of state in layers. One rough idea that might work is to have your recurrent layer output its state as well as the function it actually computes in one matrix. The layer's params would have to include the output of the previous layer as well as its current state (although state would not be included in the Network's weights variable). Doing this would require some annoying code with splitting and recombining MatrixSlices, but that's a small price to pay if it works. I'm imagining modifying this to something like

        let mut index = 0;
        for (i, layer) in self.layers.iter().enumerate() {
            let shape = layer.param_shape();

            let slice = unsafe {
                MatrixSlice::from_raw_parts(weights.as_ptr().offset(index as isize),
                                            shape.0,
                                            shape.1,
                                            shape.1)
            };

            let (output, state) = if i == 0 {
                // You might be wondering why append state to slice instead of making it a separate parameter
                // This is just to keep modifications to the NetLayer interface minimal since most layers shouldn't have any state
                let both = layer.forward(inputs, slice.append(states.last())).unwrap();
                // split both into true output and state
                (rows 0...layer.output_cutoff() of both, rows layer.output_cutoff()...both.rows() of both)
            } else {
                let both = layer.forward(activations.last().unwrap(), slice.append(states.last())).unwrap()
                (rows 0...layer.output_cutoff() of both, rows layer.output_cutoff()...both.rows() of both)
            };

            activations.push(output);
            params.push(slice);
            states.push(state);
            index += layer.num_params();
        }
        let output = activations.last().unwrap();

The actual changes would probably look different than above, but basically just keep track of states as you forward propagate without directly modifying the Network or its layers, and then you'll have them stored for computing gradients via BPTT.

Finally, I'm not claiming something like what I've sketched is the only or best way to go about adding RNNs to this codebase, but hopefully some of this comment has been informative.

usamec commented 6 years ago

Couple of comments from regular user of RNNs: a) it is very hard to write proper library for neural nets, since the structure you want really varies (do you want to predict one output per sequence or per step/do you want to do bidirectional/do you want seq2seq/ do you want semisupervised pretraining/...) b) You need to change optimizers / types of cell a lot. This needs to be swapable and not baked in. c) A lot of time you want to mix convolutional, regular and recurrent nets. d) GPU implementation is super hard, but mostly needed (not everytime though)

TLDR: You cannot get it right. I cannot get it right. Most of us cannot get it right.

Based on that, I would really suggest to leave neural net implementation to specialized libs (like wrappers around tensorflow, cntk, or whatever, ...) and focus on the rest of API.