AtheMathmo / rusty-machine

Machine Learning library for Rust
https://crates.io/crates/rusty-machine/
MIT License
1.25k stars 152 forks source link

Improvements to Neural Networks #91

Open AtheMathmo opened 8 years ago

AtheMathmo commented 8 years ago

There are some issues with the current Neural Network implementation. Though rusty-machine will not be able to beat existing deep learning frameworks it would be nice if we could provide a simple but powerful api.

Here are some issues with the current implementation that we would like to address:

There are other issues. But if we can find a solution to address these alone it would be a good step in the right direction.

One obvious improvement would be to adjust how the layers are defined in the model. Currently we simply give each layer a size. If instead each layer could be defined individually based on some enum we would gain more control (it could contain a simple ActivationFunc for feed forward, Pool etc.)

NivenT commented 8 years ago

I'm not sure that have an enum for different types of layers is the best decision. I feel like that would lead to unnecessarily long match blocks that could be hard to understand. Something like

match layer {
    Linear{..} => {
         // blah
    },
    Conv{..} => {
         // blah
    },
    Pool{..} => {
         // blah
    },
    Softmax{..} => {
         // blah
    },
    LogSoftmax{..} => {
         // blah
    },
    Reshape{..} => {
         // blah
    },
   etc.
}

I think a more standard approach would be to define a layer trait.

pub trait NetLayer {
   type Input;
   type Output;

   fn forward(&self, input: &Input) -> Output;
   fn backward(&self, out_grad: &Output) -> Input;
}

A network could then contain a Vec<Box<NetLayer>> and compute outputs/gradients by repeatedly calling forward/backward on its layers. One issue with this would be gathering all the parameters in the network for optimization. One possible resolution would be for NetLayer to also require some kind of get_parameters and/or set_parameters functions the network could use to gather all the parameters for optimization. Alternatively, the network could hold all the parameters itself, and then NetLayers would require that the relevant parameters be passed in to the forward and backward functions.

pub trait NetLayer {
    type Input;
    type Output;

    fn forward(input: &Input, params: &[f64]) -> Output;
    fn backward(out_grad: &Output, params: &[f64]) -> Output;
    fn default_params() -> &[f64];
    fn num_params() -> usize;
}

struct NeuralNet {
    layers: Vec<Box<NetLayer>>,
    params: Vec<f64>
}

impl NeuralNet {
    fn add_layer(layer: Box<NetLayer>) {
        layers.push(layer);
        params.push_all(layer.default_params());
    }
    fn predict(inputs: &Matrix<f64>) {
        let index = 0;
        let mut output = inputs;
        for layer in layers {
            output = layer.forward(&output, params[index..index+layer.num_params()]);
            index += layer.num_params();
        }
        output
    }
}

Hopefully the above conveys the idea I am getting at. What do you think?

AtheMathmo commented 8 years ago

Thanks for commenting!

I agree with you totally that using an enum will lead to pretty clunky code. The trait version is definitely a lot cleaner.

I also agree that figuring out how the parameters work is a little tricky - it would work nicely if we could have fields on traits! I think that taking the weights as inputs to the layer traits probably makes sense. Though I would argue that we should use MatrixSlice instead of &[T] to make it easier to reason about the dimensions. This would fit fairly well with the existing code as the weights are stored in one contiguous Vec.

I have never implemented convolution/pooling layers myself in a neural net so I'm not sure how well this captures that. But it certainly looks a lot better than the existing network code!

NivenT commented 8 years ago

I think you're right, MatrixSlice would be better than &[T]. I have little experience with convolution/pooling layers as well, but I think this approach should be fine. It's based off what I've seen from other libraries (torch, leaf, etc.).

NivenT commented 8 years ago

I was thinking about this more, and I realized that this probably wouldn't be a good place to use associated types. If you had some Activation layer than computer an activation function on every element in its input, you wouldn't want it to be restricted to only working on Vector or Matrix. Also, with a Reshape layer, you would want it to be able to turn Matrix into Vector (Ex. flattening images) or vice versa (Ex. outputting an image using an autoencoder).

AtheMathmo commented 8 years ago

I think it might be acceptable to restrict the actions to only work on Matrix. Vector only really exists to optimize some operations on a column of data (it could probably be made redundant with some effort).

Though it would be cool if we could use type magic to only allow the correct layers to be chained (i.e. Layer1 -> Layer2iff Layer1::Output == Layer2::Input. I'm not sure that's possible though.

It's possible I am misunderstanding your intention though.

NivenT commented 8 years ago

Only working on Matrix would work at first, but we might hit a bump when it comes to conv layers. Convolutions usually evaluate multiple feature maps and output 3D matrices (Vec<Matrix>). We could stack the outputted matrices on top of each other, but then we lose information of the depth of the output. We could only work on Vec<Matrix>.

My worry was that using associated types would restrict expressiveness of layers too much. Like, we would need a ReshapeVectorMatrix and a ReshapeMatrixVector instead of just a Reshape layer. It's not a big issue, just annoying. Working on only one type like you suggest would resolve the issue.

For checking that layers are chained correctly, could something like this work?

fn are_compatible<T, U, V>(_: &NetLayer<Input=T, Output=U>, _: &NetLayer<Input=U, Output=V>) {
}

impl NeuralNet {
    fn add_layer(a: blah) {
        match self.layers.last() {
            Some(b) => are_compatible(&a, &b),
            None => {}
        }
        // blah
    } 
}

This should check at compile time that things are as they should be, right?

AtheMathmo commented 8 years ago

I see what you mean - we do indeed want to work with other types. Though I think this is a valuable enough update that we could first make the change with the existing structure and later expand it to include conv/pooling layers.

That function looks like it would check that they are compatible at compile time! That's pretty neat and looks like it would have no effect on run time (though maybe some interesting effects on compile time?).

AtheMathmo commented 8 years ago

I wanted to try playing with this idea a little and ran into an issue. We can't create a trait object without specifying the associated types.

This is also indicative of a larger problem we will have if we try to store a Vec of different types.

NivenT commented 8 years ago

That's unfortunate. I tried finding a (messy) work around, but to no avail. I think you're right that this is something we can't avoid.

We'll have to not have any associated (or generic) types with the trait, so we have to force it to only operate on and return a single type, right? Do you want to make that type Matrix for now, and then worry about convolutions when we get there, or use Vec<Matrix> from the start?

AtheMathmo commented 8 years ago

I think it is best to stick with Matrix and tackle convolutions later? Also once #117 lands we can make it operate on T: BaseMatrix - that will give us a little more freedom at least.

NivenT commented 8 years ago

Ok. I think I'll start trying to implement this, and then submit a PR when I'm far enough along to be critiqued, but before its ready to merge. Does that sound alright, or would you rather work on it?

AtheMathmo commented 8 years ago

Please do!

I'd be more than happy to give input when you're ready. It may be worth you being aware of the ongoing discussion in #117, #120 and #124 . Just so that you don't invest a lot of time in a strategy which may not be viable later.