When the OPT-350M variant is fine-tuned via huggingface, the resulting model will give the following error when loaded
RuntimeError: Error(s) in loading state_dict for OPTForCausalLM:
size mismatch for lm_head.weight: copying a param with shape torch.Size([50272, 512]) from checkpoint, the shape in current model is torch.Size([50272, 1024]).
For context, I have used this code on the 125M variant and while the model didn't perform well it didn't crash, I believe that's a parameter issue (?) as I compared them both (base, not fine-tuned) and the 350m was capable of generating coherent output.
🐛 Bug
When the OPT-350M variant is fine-tuned via huggingface, the resulting model will give the following error when loaded
For context, I have used this code on the 125M variant and while the model didn't perform well it didn't crash, I believe that's a parameter issue (?) as I compared them both (base, not fine-tuned) and the 350m was capable of generating coherent output.
Code to load model
Training Code
Dataset module