special case converting tlens weights (with n_ctx = 2048) to our transformer weights (n_ctx = 512)
add a test that runs load_sequential_transformer for better coverage. Alteratively we could adapt test_tlens_conversion to use this function (currently it operates with lower level functions)
I've submitted issues to the two repositories upstream so that this gets fixed for others. Eventually we can remove the special case handling:
Tiny stories works again
Description
n_ctx = 2048
) to our transformer weights (n_ctx = 512
)load_sequential_transformer
for better coverage. Alteratively we could adapttest_tlens_conversion
to use this function (currently it operates with lower level functions)I've submitted issues to the two repositories upstream so that this gets fixed for others. Eventually we can remove the special case handling:
Does this PR introduce a breaking change?
No.