ApolloResearch / rib

Library for methods related to the Local Interaction Basis (LIB)
MIT License
3 stars 0 forks source link

Tiny stories support #232

Closed nix-apollo closed 9 months ago

nix-apollo commented 9 months ago

tiny stories

Description

Related Issue

Motivation and Context

It's nice to be able to test our methods on a range of models. Tiny-stories is probably a better choice than pythia-14M for many experiments, as:

How Has This Been Tested?

Added tinystories to various tests, often piggybacking on gpt2. These check the sequential transformer is loaded properly and the output is consistent with the TL version.

I've also manually checked the loss is comparable with the published loss in the paper. We get a loss of 2.40 vs 2.38 in the paper, so it's not identical. I'd guess this is from different tokenisation, it's not clear that they used packed sequences. Still, it's good enough performance for me to think the model is basically doing it's job.

Does this PR introduce a breaking change?

No.