EleutherAI / pythia

The hub for EleutherAI's work on interpretability and learning dynamics
Apache License 2.0
2.16k stars 156 forks source link

Details about "EleutherAI/pythia-160m-seed*" models #142

Closed IanMagnusson closed 7 months ago

IanMagnusson commented 7 months ago

Hello! Thank you for making this fantastic suite of models; I think this is one of the most important contributions to the research community in recent memory.

I have a question about the training details of the EleutherAI/pythia-160m-seed* models that are hosted on HF hub, and hopefully this might be a good place to ask. I'm curious specifically what the seeds that differ between these models and also presumably the EleutherAI/pythia-160m model control. Do they control both the weight initialization and the training data shuffle order? Or perhaps only one or the other? It seems these were released after the paper, since the paper says there are no experiments over different seeds.

Thank you so much for any clarification you can offer!

haileyschoelkopf commented 7 months ago

Hi! These vary both in training data shuffle and weight initialization. We did indeed train them after the paper--a couple 160m models a while ago, and quite a few new seeds more recently for some work-in-progress work.

(Maybe @oskarvanderwal can confirm re: the recent ones!)

IanMagnusson commented 7 months ago

Fantastic! Thank you so much for the quick clarification

oskarvanderwal commented 6 months ago

We are planning on releasing more Pythia models for different seeds for the smaller models. As @haileyschoelkopf mentioned, the seed is used for both the data and the weights for the new ones as well. Once we've trained all the models, I'll make sure to add more information in the README!