Closed leogao2 closed 1 year ago
Hasn't this already been investigated via the chess language models which just output move sequences? IIRC they worsened a lot after some number of moves because of possibly this issue.
Hasn't this already been investigated via the chess language models which just output move sequences? IIRC they worsened a lot after some number of moves because of possibly this issue.
You mean the ones that play chess? I think there's a lot of possible reasons why a LM would be bad at chess after the first few moves. In any event, I don't really care about how good the model is at playing chess / solving the cube, I care way more about whether it can learn the state specifically.
Hello, I'm interested in the task, I would like some guidance for it. Thanks
Background
Language models need to be able to learn things about the world while only partially observing some of the internal state. For example, LMs will never be able to observe what goes on inside people's heads, but they can observe part of the state through what humans say and do. Can LMs learn this kind of internal state in theory?
What to plot?
To test this we can train models on synthetic data where we take a system, in this case a rubiks cube, and give the model a sequence of cube moves and ask it to output the final cube state (I already have the data generation working, PM me for details). This is obviously a really hard task. We could also make the internal state fully observable (give it the intermediate cube state after each move). Or we could give it partial info, by only allowing it to look at 1 or 2 etc faces of the cube. Some things I'd want to try include the following:
Plots would be accuracy or perplexity or something per setting.
Related Papers/Frameworks
https://www.alignmentforum.org/posts/EmxfgPGvaKqhttPM8/thoughts-on-the-alignment-implications-of-scaling-language