EleutherAI / project-menu

See the issue board for the current status of active and prospective projects!
65 stars 4 forks source link

[RFP] Low and high order coherence as model loss improves #20

Closed leogao2 closed 1 year ago

leogao2 commented 3 years ago

Background

We predict that LMs will learn low order coherence (i.e grammar) first, and then learn higher order coherence (i.e logical soundness) later (where by later I mean lower loss). The question is, even if this seems intuitively correct, can we actually observe this in real models? If it does, this seems to provide important insights into how learning dynamics are and lets us infer a more discontinuous (or more continuous, who knows) improvement in capabilities in the future.

What to plot?

Take either one model at a ton of different checkpoints, or a series of comparable models trained to x tokens. Figure out some way to measure grammaticalness, logical coherency, etc of outputs (probably want prompts of a genre where a lack of logical coherence is obvious), using either human feedback or automatic metrics or a mix. For each of these things, plot it vs loss/compute/params/etc, and see if higher order stuff only starts improving after lower order stuff saturates.

Related Papers/Frameworks

See https://www.gwern.net/Scaling-hypothesis#why-does-pretraining-work and https://www.alignmentforum.org/posts/EmxfgPGvaKqhttPM8/thoughts-on-the-alignment-implications-of-scaling-language for some relevant thoughts.

Jeevesh8 commented 2 years ago

A primitive view can be to just try and use SentEval at various stages of pre-training, to see the level of coherence captured by the model at that point during pre-training.