This simple change would half address issue #31 . It generalises the training procedure for the sigmoid to other invertible functions.
It doesn't work for most non-invertible activation functions.[a] Those could be addressed by a different training procedure which would require the weighted sum input to a node instead of just the resulting value, and would use the derivative proper instead of a "differential expression" in terms of the function value. I might implement this at some point, but it would be more of a hassle.
[a] one exception is ReLU, which is non-invertible as it has the same value for all negative inputs; it just so happens that the derivative also has the same value for all of those inputs.
I am unhappy with the naming - "differential expression" was the best I could come up with, hopefully someone knows a better term for this.
If there are any adjustments I need to make, please let me know.
This simple change would half address issue #31 . It generalises the training procedure for the sigmoid to other invertible functions. It doesn't work for most non-invertible activation functions.[a] Those could be addressed by a different training procedure which would require the weighted sum input to a node instead of just the resulting value, and would use the derivative proper instead of a "differential expression" in terms of the function value. I might implement this at some point, but it would be more of a hassle. [a] one exception is ReLU, which is non-invertible as it has the same value for all negative inputs; it just so happens that the derivative also has the same value for all of those inputs.
I am unhappy with the naming - "differential expression" was the best I could come up with, hopefully someone knows a better term for this.
If there are any adjustments I need to make, please let me know.