EleutherAI / pythia

The hub for EleutherAI's work on interpretability and learning dynamics
Apache License 2.0
2.16k stars 156 forks source link

"gas" configuration doesn't do anything #149

Open segyges opened 5 months ago

segyges commented 5 months ago

Per this, my understanding is that the gas config in neox doesn't do anything, and shouldn't be used, and should be removed. We should be using gradient_accumulation_steps instead.

It appears that all existing pythia configs set gas to 1, which is the default for gradient_accumulation_steps anyway, so this will not matter. Per that same search some of the old eval results specifically show gas at 2, which would be a bad error and would halve effective batch size if the expectation was that gas did something.

I am not putting in a PR to replace gas with gradient_accumulation_steps because these configs are references for the settings of existing artifacts, so it's not clear to me that they should be fixed to be "correct", or if they are, what the correct steps would be to make sure that they're preserved as references on those artifacts if the configuration is fixed going forward.