Isolate the variance in layernorm into a separate RIB dir
Description
Instead of treating the LN variance as any other function in the RIB build, we now special case our treatment so that there is a single RIB direction representing the LN variance.
Motivation and Context
We had some really messy RIB graphs. LNs shouldn't be that complicated. Now they in fact look clean!
How Has This Been Tested?
I've added a single test that checks a new invariant of the RIB graph.
Does this PR introduce a breaking change?
The RIB graphs produced with this PR will be different. This isn't currently configurable either -- it's just always happens in LayerNorm.
Isolate the variance in layernorm into a separate RIB dir
Description
Instead of treating the LN variance as any other function in the RIB build, we now special case our treatment so that there is a single RIB direction representing the LN variance.
Motivation and Context
We had some really messy RIB graphs. LNs shouldn't be that complicated. Now they in fact look clean!
How Has This Been Tested?
I've added a single test that checks a new invariant of the RIB graph.
Does this PR introduce a breaking change?
The RIB graphs produced with this PR will be different. This isn't currently configurable either -- it's just always happens in LayerNorm.
The interface is the same.