Currently we are looking at language models (pythia, tiny-stories) that have 2048 context window. It should be possible to get interesting behavior in much shorter context windows.
This should improve performance, especially with the new attribution method (depending on what clever things are figured out). It's not clear to me how much the improvement actually is.
Should probably just be part of the dataset config.
Currently we are looking at language models (pythia, tiny-stories) that have 2048 context window. It should be possible to get interesting behavior in much shorter context windows.
This should improve performance, especially with the new attribution method (depending on what clever things are figured out). It's not clear to me how much the improvement actually is.
Should probably just be part of the dataset config.