bigscience-workshop / xmtf

Crosslingual Generalization through Multitask Finetuning
https://arxiv.org/abs/2211.01786
Apache License 2.0
516 stars 37 forks source link

Export mt0-xxl-mt to ONNX fails #21

Closed sh0tcall3r closed 1 year ago

sh0tcall3r commented 1 year ago

Hello, guys! As the title says, I'm trying to export mt0-xxl-mt (with some adjustments, which I specify later) to ONNX, but the export fails all the time. So, regarding model adjustments: I've loaded the model from hugging face in 8bit precision mode, then I fine-tuned it on my downstream task with LORA/PEFT and after that trying to export it to ONNX. I've just realized that In both basic model from hugging face and model after LORA/PEFT finetuning state_dict there is a curious layer named 'weight_format' with the value 'row' instead of weights tensor. And the export to ONNX fails because of the export function trying to apply detach() method on that value, which obviously generates an error. So my questions are:

  1. What is the 'weight_format' layer and what it stands for?
  2. If I just clear this layer out of the state_dict and the model architecture, will it cause a further error/model work instability?
  3. Is there a "good" way to export this model to ONNX, without adjusting the state_dict and the model architecture?
sh0tcall3r commented 1 year ago

@thomasw21 Would be very appreciate for your help here. I'm really stuck in one place.

thomasw21 commented 1 year ago

Hi @sh0tcall3r ! First off, sorry for not responding earlier, was busy with other projects. It seems you have quite a setup.

What is the 'weight_format' layer and what it stands for?

It's my understanding that weight_format is something specific to https://github.com/TimDettmers/bitsandbytes . I would suggest asking your question there.

If I just clear this layer out of the state_dict and the model architecture, will it cause a further error/model work instability?

Also linked to bitsandbytes I would guess

Is there a "good" way to export this model to ONNX, without adjusting the state_dict and the model architecture?

Normally the model is fairly standard encoder-decoder style model. They are usually well maintained in the HF ecosystem. If you have issues with your state dict, I would suggest trying to reach out to the transformers team. Note there was a release linked to bitsandbytes not too long ago, so your issue might be fixed? https://github.com/huggingface/transformers/pull/24416

I'll close the issue since your questions seem to mostly involve third party libraries. Please feel free to re-open if you think I'm mistaken.