XtractOpen / Meganet.jl

A fresh approach to deep learning written in Julia
http://www.xtract.ai/
Other
14 stars 9 forks source link

Change the way activation is handled? #58

Open lruthotto opened 6 years ago

lruthotto commented 6 years ago

What do you guys think about the changes I made in singleLayer here: https://github.com/XtractOpen/Meganet.jl/commit/4869548dfc61dac4b7f310e63bba391d7be02316

I'm using map! to apply the pointwise nonlinearity in place. The user will have to provide a function for the activation and derivative. We could get rid of tanhActivation.jl, reluActivation.jl, and identityActivation.jl.

If you think this is a good way to go, I'll make it consistent and update the examples. Maybe someone can help me and benchmark the change!?

eldadHaber commented 6 years ago

Not sure what is the advantage of getting rid of the activation functions?

On Feb 17, 2018, at 5:53 PM, Lars Ruthotto notifications@github.com wrote:

What do you guys think about the changes I made in singleLayer here: 4869548

I'm using map! to apply the pointwise nonlinearity in place. The user will have to provide a function for the activation and derivative. We could get rid of tanhActivation.jl, reluActivation.jl, and identityActivation.jl.

If you think this is a good way to go, I'll make it consistent and update the examples. Maybe someone can help me and benchmark the change!?

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub, or mute the thread.

erantreister commented 6 years ago

Hi Lars, Have you seen this code that I posted in another conversation?

function reluActivation!(A::Array{T},dA::Array{T},doDerivative::Bool=false) where {T}
A .= max.(A,zero(T));
if doDerivative
    dA .= sign.(A);
else
    dA = zeros(T,0)
end
return A,dA
end

The .= operator does stuff in place (no allocation). This can replace the map! which is not so intuitive (but probably does the same). We can change the code to use only activation!() and remove the other ones functions.

Getting rid of the files is a different issue than in-place computation. Currently, the user can place his own activation through the driver. We just save the need to program it every time (like regularizers in jInv). I don't see why we need to remove it, but placing them all in activations.jl somewhere may be an idea in this direction.

Back to performance... if every time we compute the activation, we also compute the derivative, then we may save some time by computing both in a single read of the array from the memory. Can map!() do that?

lruthotto commented 6 years ago

OK, let's sort this through:

  1. note that in the updated code for the single layer and the double layer, we compute either the activation (in apply) or the derivative (in the Jmv's). We don't typically compute both at the same time.

  2. @erantreister : Your new code should be equivalent to the map! option when we call it for the activation. When we call it for both, your code will be a bit faster when the derivatives re-use the result from the activation.

  3. right now, the user would have to provide both activation and derivative. I agree that this is inconvenient. It would be nice to bundle them up in some sense but I haven't gotten a good idea for that other than introducing a type for the activation.

erantreister commented 6 years ago

@lruthotto OK. I do not really get your bottom line, but if you like it one way or the other, I'm fine with it. This is quite minor anyway. Just make sure that map! is not really slow for some reason.

lruthotto commented 6 years ago

OK, let's leave everything as is then for now... it's not too important anyway