ddbourgin / numpy-ml

Machine learning, in numpy
https://numpy-ml.readthedocs.io/
GNU General Public License v3.0
15.35k stars 3.72k forks source link

activations.py optimizations #33

Closed jaymody closed 5 years ago

jaymody commented 5 years ago

Taking a quick look, some of the grad and grad2 functions might benefit from some optimizations. Here's on example:

https://github.com/ddbourgin/numpy-ml/blob/fce2acfd7c370f55373bdc6dff1761a8258bfe27/numpy_ml/neural_nets/activations/activations.py#L38-L39

Here the function could be changed such that fn(x) is only computed once:

def grad(self, x):
    fn_x = self.fn(x)
    return fn_x * (1 - fn_x)

The extra mem used to store the calculation should be immediately collected after the function ends so that shouldn't be a problem. Would love a second opinion @ddbourgin before making a PR with the necessary changes.

ddbourgin commented 5 years ago

Thanks @JayMody ! You're definitely right - there are a ton of inefficient / memory intensive methods in the neural network module. My general feeling is that optimizations are fine so long as they don't obscure the actual math behind the calculation. Clearly I have erred very heavily on the side of inefficient and explicit (as opposed to fast and obscure), though I think in many cases this is probably a false dichotomy ;)

For the example you included, I think the optimization is totally justified. If you'd like to work on this, I'd say use your discretion + do atomic commits so we can review on a case-by-case basis.

jaymody commented 5 years ago

I want to be able to keep the explicit nature of the code as it seems like one of the attributes that sets this library apart from others. I guess it's hard to strike a balance between efficiency and clarity. For example in the sigmoid activation:

def grad2(self, x):
    return self.grad(x) * (1 - 2 * self.fn(x))

The above code has lot's of clarity and is easy to digest versus:

Method 1

def grad2(self, x):
    fn_x = self.fn(x)
    return fn_x * (1 - fn_x) * (1 - 2 * fn_x)

Here, it's no longer clear that grad_x is hidden inside the equation. I could do something like this:

Method 2

def grad2(self, x):
    fn_x = self.fn(x)
    grad_x = fn_x * (1 - fn_x)
    return grad_x * (1 - 2 * fn_x)

This is clearer to read, but repeats already existing code.

What would you suggest?

ddbourgin commented 5 years ago

Good analysis! To me, method 2 strikes the best balance between conceptual clarity and memory usage.. I think it's possible to further mitigate conceptual confusion by adding a docstring for the grad2 function (oops...) which clearly states that the second derivative of the logistic sigmoid is f'(x) * ( 1 - 2f(x) ). That way, everything is crystal clear :)

ddbourgin commented 5 years ago

Merged #35. Closing.