Closed sh0tcall3r closed 1 year ago
@thomasw21 Would be very appreciate for your help here. I'm really stuck in one place.
Hi @sh0tcall3r ! First off, sorry for not responding earlier, was busy with other projects. It seems you have quite a setup.
What is the 'weight_format' layer and what it stands for?
It's my understanding that weight_format
is something specific to https://github.com/TimDettmers/bitsandbytes . I would suggest asking your question there.
If I just clear this layer out of the state_dict and the model architecture, will it cause a further error/model work instability?
Also linked to bitsandbytes I would guess
Is there a "good" way to export this model to ONNX, without adjusting the state_dict and the model architecture?
Normally the model is fairly standard encoder-decoder style model. They are usually well maintained in the HF ecosystem. If you have issues with your state dict, I would suggest trying to reach out to the transformers
team. Note there was a release linked to bitsandbytes not too long ago, so your issue might be fixed? https://github.com/huggingface/transformers/pull/24416
I'll close the issue since your questions seem to mostly involve third party libraries. Please feel free to re-open if you think I'm mistaken.
Hello, guys! As the title says, I'm trying to export
mt0-xxl-mt
(with some adjustments, which I specify later) toONNX
, but the export fails all the time. So, regarding model adjustments: I've loaded the model from hugging face in8bit
precision mode, then I fine-tuned it on my downstream task withLORA/PEFT
and after that trying to export it toONNX
. I've just realized that In both basic model from hugging face and model afterLORA/PEFT
finetuningstate_dict
there is a curious layer named'weight_format'
with the value'row'
instead of weights tensor. And the export toONNX
fails because of the export function trying to applydetach()
method on that value, which obviously generates an error. So my questions are:'weight_format'
layer and what it stands for?state_dict
and the model architecture, will it cause a further error/model work instability?ONNX
, without adjusting thestate_dict
and the model architecture?