Open dashstander opened 4 weeks ago
Thanks for finding this, and fixing it for everyone. There is just one type error that needs to be resolved when creating your tensor for mlp.b_in
in your else block. The variable it is complaining about d_mlp
could be None here, so there needs to either be an error thrown, or a default set if it is none at this point. I don't mind which it is. In all likelihood it will be set at this point, but we need to account for the possibility that it is not.
Description
convert_nanogpt_weights
had two issues:IGNORE
tensor.It did not correctly handle the case where the nanogpt model was configured to not have biases in the linear layers. When trying to use the function, loading the converted weights into a
HookedTransformer
would fail for lack of the proper tensors. If we're not supposed to checkpoint the masking tensor, then there is a separate issue in whichHookedTransformer
won't load a checkpoint without it there.I have not added any tests or re-written documentation. There are no existing tests and the only documentation that I can find pertaining to this issue is a comment that said the code worked both with and without biases.
Type of change
Please delete options that are not relevant.
Checklist: