ApolloResearch / rib

Library for methods related to the Local Interaction Basis (LIB)
MIT License
2 stars 0 forks source link

Ablations with reference to baseline #347

Closed stefan-apollo closed 7 months ago

stefan-apollo commented 7 months ago

Ablations with reference to baseline

Description

  1. Change the Bisect Schedule to take a desired difference from baseline loss, rather than directly the desired loss
  2. Change the plotting defaults (in case of ce_loss) to set y axis relative to baseline loss

Motivation and Context

  1. Obviously better than having to guess a threshold
  2. Previous defaults were always wrong

How Has This Been Tested?

Made some plots

Does this PR introduce a breaking change?

Yes. Existing BisectSchedule configs break. I updated the tests; otherwise we had none is main.

I think backwards compatibility is not worth it because score_target was a terrible argument and had to be adjusted every time we changed the data set.

stefan-apollo commented 7 months ago

Example plot: Grey line indicates baseline image

stefan-apollo commented 7 months ago

Another example: image