This relates to some of my early thoughts about this, specifically related to gradient descent (if a derivative is going to evaluate to zero, ignore it), but both feedforward and learning transactions could potentially benefit by selectively skipping operations. According to Brandon Reagen and the Minerva work at ISCA 2016, power could be improved by at most 2x, though this includes both static pruning and run-time pruning and it is unclear of their respective contributions. This would be an interesting avenue and is generally low cost to implement.
In broad strokes without dramatic modifications to DANA:
Skip any input value that's significantly below the currently accumulated neuron value
This relates to some of my early thoughts about this, specifically related to gradient descent (if a derivative is going to evaluate to zero, ignore it), but both feedforward and learning transactions could potentially benefit by selectively skipping operations. According to Brandon Reagen and the Minerva work at ISCA 2016, power could be improved by at most 2x, though this includes both static pruning and run-time pruning and it is unclear of their respective contributions. This would be an interesting avenue and is generally low cost to implement.
In broad strokes without dramatic modifications to DANA: