The size of tensor a (8388608) must match the size of tensor b (4096) at non-singleton dimension 0

Arnav0400 / ViT-Slim

Official code for our CVPR'22 paper “Vision Transformer Slimming: Multi-Dimension Searching in Continuous Optimization Space”

MIT License

243 stars 17 forks source link

The size of tensor a (8388608) must match the size of tensor b (4096) at non-singleton dimension 0 #17

Open Abdullah-kwl opened 3 months ago

Abdullah-kwl commented 3 months ago

During training of glora i am facing the error "The size of tensor a (8388608) must match the size of tensor b (4096) at non-singleton dimension 0"

I am using glors from PR https://github.com/Arnav0400/peft/blob/main/src/peft/tuners/glora.py

I am using same as I use qlora , fir I quantized model in 4 bit using bnb quantization then I make peft model using glora confic after when I train for dpo even for sft it shows the same error.

peft_config =GLoraConfig( peft_type="GLORA", task_type="CAUSAL_LM", r=8, target_modules=["q_proj", "v_proj"], )

peft_model = get_peft_model(model, peft_config)

Screenshot 2024-05-29 213405

Abdullah-kwl commented 3 months ago

@Arnav0400 please help me to solve this issue

viliamvolosv commented 3 months ago

Same problem

  File "/home/guest/peft/src/peft/tuners/glora/layer.py", line 102, in forward
    result = F.linear(x, self.weight + self.weight*A + B, bias=E+torch.matmul(self.weight, C).squeeze())
RuntimeError: The size of tensor a (4194304) must match the size of tensor b (4096) at non-singleton dimension 0

@Arnav0400 can you help us ?

viliamvolosv commented 3 months ago

Alsow i create PR in main PEFT repo with GLoLa https://github.com/huggingface/peft/pull/1835

BenjaminBossan commented 2 months ago

For this to work with bitsandbytes, you need to implement a different class of layers specific for quantized weights, check for instance this in PEFT: https://github.com/huggingface/peft/blob/main/src/peft/tuners/lora/bnb.py

However, be aware that the glora fork you use is very far behind PEFT, so to port this functionality, there is quite a bit of extra work required.