Open brandonwillard opened 3 years ago
I can have a go at this. Which version do you prefer? I'm thinking this would be good to add: https://github.com/NTimmons/FastActivations.jl/blob/aaa61f84dfcd75825b20676011355189cb497d8d/src/SigmoidFittedApproximations.jl#L407-L465
Is there a benchmark suite to verify said claims or do we rely on micro-benchmarks like %time <some func>
?
One issue I see that there is no license in that repo so Im not sure the implementations can be copied. Plus the owner hasnt been active since December 2020, so opening an issue requesting one might not get a reply in a reasonable time-frame.
I can have a go at this. Which version do you prefer? I'm thinking this would be good to add: https://github.com/NTimmons/FastActivations.jl/blob/aaa61f84dfcd75825b20676011355189cb497d8d/src/SigmoidFittedApproximations.jl#L407-L465
That seems fine.
Is there a benchmark suite to verify said claims or do we rely on micro-benchmarks like
%time <some func>
?
You can set profile=True
when calling aesara.function
; that will add profile statistics to the function when it's run.
One issue I see that there is no license in that repo so Im not sure the implementations can be copied. Plus the owner hasnt been active since December 2020, so opening an issue requesting one might not get a reply in a reasonable time-frame.
These approximations are simple enough to implement from scratch using only the referenced paper, so we can always do that.
Could you link please to the paper? I don't see it mentioned there.
Could you link please to the paper? I don't see it mentioned there.
In the original Theano issue here.
Looking at their paper more closely. They are not claiming their implementation is "numerically faster" but that when they fit Neural Networks with their implementation they get faster training, because of a better trade-off between their approximation and computational complexity...
By the way, I recall at least a few other fast approximations for the types of functions involved in a sigmoid calculation, so, if you want to implement/try any other ones you come across, feel free. In other words, I have no reason believe that this is the best approach.
Looking at their paper more closely. They are not claiming their implementation is "numerically faster" but that when they fit Neural Networks with their implementation they get faster training, because of a better trade-off between their approximation and computational complexity...
Yes, the premise is that these are approximations; that should be clear, along with everything that entails (e.g. the requisite trade-offs).
Sure, just wanted to make sure I was not misleading anyone with the suggestion
By the way I just came across this: https://github.com/NTimmons/FastActivations.jl
They claim they have faster and better approximations than this one...
Also discussed here on the old Theano repo: https://github.com/Theano/Theano/issues/6731
Originally posted by @ricardoV94 in https://github.com/aesara-devs/aesara/issues/550#issuecomment-894616724