Closed laoda513 closed 1 year ago
BTW, is there any plan to support with gptq?
@laoda513 Hi! I'm not familiar with GPTQ, so I'm not sure about adding support for it. About bitsandbytes 4bit: is it even released yet? I thought it's still in closed beta.
@laoda513 Hi! I'm not familiar with GPTQ, so I'm not sure about adding support for it. About bitsandbytes 4bit: is it even released yet? I thought it's still in closed beta.
yes it's not released yet... Maybe I am too impatient 😂😂😂
Oh, my apologize.. After upgrade peft to the main branch, it works~
@laoda513 to fix RuntimeError: All tensors must be on devices[0]: 0
simply put your input tensors on cuda:0
. I don't know why it's suddenly necessary: any cuda worked in the past. I'll look into it.
I'll close this issue since #80 is very similar to it. Please continue the discussion there.
with de demo in readme 👍 🚀 Try new 20B LLMs demo in Kaggle
switch to using 4bit:
`with accelerate.init_empty_weights(): model = transformers.AutoModelForCausalLM.from_config(transformers.AutoConfig.from_pretrained(".../hf-LLaMA/13B")).half()
for inference: it not works without '.cuda()'
RuntimeError: All tensors must be on devices[0]: 0
but works with
inputs = tokenizer("cat:", return_tensors="pt")["input_ids"].to("cuda:0")
for training with peft lora it does not work. with to("cuda:0")
File "/home/user/miniconda3/lib/python3.10/site-packages/peft/tuners/lora.py", line 565, in forward result = F.linear(x, transpose(self.weight, self.fan_in_fan_out), bias=self.bias) RuntimeError: mat1 and mat2 shapes cannot be multiplied (10010x5120 and 1x4587520)
also not work with cuda():
RuntimeError: All tensors must be on devices[0]: 0