aesara-devs / aesara

Aesara is a Python library for defining, optimizing, and efficiently evaluating mathematical expressions involving multi-dimensional arrays.
https://aesara.readthedocs.io
Other
1.18k stars 155 forks source link

Improved `UltraFastScalarSigmoid` implementation #553

Open brandonwillard opened 3 years ago

brandonwillard commented 3 years ago

By the way I just came across this: https://github.com/NTimmons/FastActivations.jl

They claim they have faster and better approximations than this one...

Also discussed here on the old Theano repo: https://github.com/Theano/Theano/issues/6731

Originally posted by @ricardoV94 in https://github.com/aesara-devs/aesara/issues/550#issuecomment-894616724

zoj613 commented 3 years ago

I can have a go at this. Which version do you prefer? I'm thinking this would be good to add: https://github.com/NTimmons/FastActivations.jl/blob/aaa61f84dfcd75825b20676011355189cb497d8d/src/SigmoidFittedApproximations.jl#L407-L465

Is there a benchmark suite to verify said claims or do we rely on micro-benchmarks like %time <some func>?

One issue I see that there is no license in that repo so Im not sure the implementations can be copied. Plus the owner hasnt been active since December 2020, so opening an issue requesting one might not get a reply in a reasonable time-frame.

brandonwillard commented 3 years ago

I can have a go at this. Which version do you prefer? I'm thinking this would be good to add: https://github.com/NTimmons/FastActivations.jl/blob/aaa61f84dfcd75825b20676011355189cb497d8d/src/SigmoidFittedApproximations.jl#L407-L465

That seems fine.

Is there a benchmark suite to verify said claims or do we rely on micro-benchmarks like %time <some func>?

You can set profile=True when calling aesara.function; that will add profile statistics to the function when it's run.

One issue I see that there is no license in that repo so Im not sure the implementations can be copied. Plus the owner hasnt been active since December 2020, so opening an issue requesting one might not get a reply in a reasonable time-frame.

These approximations are simple enough to implement from scratch using only the referenced paper, so we can always do that.

zoj613 commented 3 years ago

Could you link please to the paper? I don't see it mentioned there.

brandonwillard commented 3 years ago

Could you link please to the paper? I don't see it mentioned there.

In the original Theano issue here.

ricardoV94 commented 3 years ago

Looking at their paper more closely. They are not claiming their implementation is "numerically faster" but that when they fit Neural Networks with their implementation they get faster training, because of a better trade-off between their approximation and computational complexity...

brandonwillard commented 3 years ago

By the way, I recall at least a few other fast approximations for the types of functions involved in a sigmoid calculation, so, if you want to implement/try any other ones you come across, feel free. In other words, I have no reason believe that this is the best approach.

brandonwillard commented 3 years ago

Looking at their paper more closely. They are not claiming their implementation is "numerically faster" but that when they fit Neural Networks with their implementation they get faster training, because of a better trade-off between their approximation and computational complexity...

Yes, the premise is that these are approximations; that should be clear, along with everything that entails (e.g. the requisite trade-offs).

ricardoV94 commented 3 years ago

Sure, just wanted to make sure I was not misleading anyone with the suggestion