ApolloResearch / rib

Library for methods related to the Local Interaction Basis (LIB)
MIT License
2 stars 0 forks source link

Isolate the variance in layernorm into a separate RIB dir #345

Closed nix-apollo closed 4 months ago

nix-apollo commented 4 months ago

Isolate the variance in layernorm into a separate RIB dir

Description

Instead of treating the LN variance as any other function in the RIB build, we now special case our treatment so that there is a single RIB direction representing the LN variance.

Motivation and Context

We had some really messy RIB graphs. LNs shouldn't be that complicated. Now they in fact look clean!

How Has This Been Tested?

I've added a single test that checks a new invariant of the RIB graph.

Does this PR introduce a breaking change?

The RIB graphs produced with this PR will be different. This isn't currently configurable either -- it's just always happens in LayerNorm.

The interface is the same.

nix-apollo commented 4 months ago

Old graph: image

New graph: image

We in fact only get one non-constant RIB directions that read in the variance direction image