AI4Bharat / IndicTrans2

Translation models for 22 scheduled languages of India
https://ai4bharat.iitm.ac.in/indic-trans2
MIT License
214 stars 59 forks source link

Saving Distillation model #77

Closed harshyadav17 closed 3 months ago

harshyadav17 commented 3 months ago

Hey @PranjalChitale,

I was trying to save the distilled model using the given script: convert_indictrans_checkpoint_to_pytorch.py.

Because we are using the shared tensor for lm_head.weight and model.decoder.embed_tokens.weight I am facing the following issue.

File "/workspace/research/IndicTrans2/huggingface_interface/convert_indictrans_checkpoint_to_pytorch.py", line 107, in <module>
    model.save_pretrained(args.pytorch_dump_folder_path)
  File "/opt/conda/envs/itv2/lib/python3.9/site-packages/transformers/modeling_utils.py", line 2546, in save_pretrained
    raise RuntimeError(
RuntimeError: The weights trying to be saved contained shared tensors [{'lm_head.weight', 'model.decoder.embed_tokens.weight'}] that are mismatching the transformers base configuration. Try saving using `safe_serialization=False` or remove this tensor sharing.

For distilled models are you using safe_serialization=False or is it something else? Thanks!

VarunGumma commented 3 months ago

Yes, please use safe_serialization=False.

Also, we sincerely request you not to open new issues for every single error encountered, before exhausting all options and efforts from your end. Thank you!