ApolloResearch / rib

Library for methods related to the Local Interaction Basis (LIB)
MIT License
2 stars 0 forks source link

Naive implementation of gradient flow (in refactored code) #333

Closed stefan-apollo closed 4 months ago

stefan-apollo commented 5 months ago

Description: Naive implementation of the gradient flow method, based on looping rib_build a few times. Approximately slows down the code by num_node_layers/2.

Tested: Did comparison plots below. They look much closer to NNIB than expected (from #141 plots) but we can't see anything broken. Also implemented tests

No breaking changes. Doesn't work on MLP because our MLP implementation requires the node_layers to be a strict sub-sequence of model layers, which cannot be done in naive gradient flow. Added validation for this.


Mod add example:

With naive gradient flow: image With nearest neighbour: image

stefan-apollo commented 5 months ago

Graphs look very similar (bug?) and ablation curve look sus compared to old ones image

stefan-apollo commented 5 months ago

The code is definitely running the gradient-flow fashion, always computing [current_node_layer, final_node_layer]. Thus there cannot be any nearest neighbour computation going on. I'd rule out simple bugs. image

stefan-apollo commented 5 months ago

Hmm ablation curves look sus as well, and unlike those in #141 image

stefan-apollo commented 5 months ago

Still happens even if I use the old (1-alpha)^2 basis image

stefan-apollo commented 5 months ago

Lambdas do seem to shift differently between normal and NGF though image

stefan-apollo commented 5 months ago

Cosine similarity of bases: image

stefan-apollo commented 5 months ago

Run with lots of neighbouring node layers: image

stefan-apollo commented 5 months ago

I've cleaned up the code a bit and removed all the lies. Changes since review: