AutoGPTQ / AutoGPTQ

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
MIT License
4.09k stars 421 forks source link

I can't load the wizardlm 13b model nomore since V0.3.0 #195

Open ParisNeo opened 11 months ago

ParisNeo commented 11 months ago

When I run my code that uses gptQ, I get this warning and it simply crushes when I load the WizardLM-13B model!

WARNING:accelerate.utils.modeling:The safetensors archive passed at C:\Users\aloui\Documents\lollms\models\gptq\WizardLM-13B-V1.1-GPTQ\wizardlm-13b-v1.1-GPTQ-4bit-128g.no-act.order.safetensors does not contain metadata. Make sure to save your model with the save_pretrained method. Defaulting to 'pt' metadata.

It used to work with previous version.

ParisNeo commented 11 months ago

It turns out this was a memory problem. But I think if it fails, it is not a good thing that it completely crushes the entire program. This blocks my software completely and the user needs to go change the configuration manually to get it to work.

Is there a way to solve this? Thanks in advance.

Also, if possible, is there a function or method that allows me to free up the model from memory if I want to change to another one? I need this as in Lollms The user can change the model any time and now it simply crushes whenever I try another mode.

Thanks

PanQiWei commented 11 months ago

I will look into this late of this week, but to my knowledge there is no change of auto-gptq' s model loading mechanism in the new version, is it possible that this problem is caused by some dependencies' version change? It would be great if you can help to check this 🙏

ParisNeo commented 11 months ago

Hi, I managed to make it work again after uninstalling and reinstalling everything. I don't really know why was it crushing. If I have more information I'll tell you. Thanks alot for this really cool work.

ParisNeo commented 11 months ago

I think I can reproduce the bug. Write a program that loads gptQ model in gpu mode with all in gpu (no splitting). Then run that two or three times (depends on the vram you have). The last one will crush and simply exit the application. I also can produce this by loading stable diffusion model then run an application that uses auto-gptq.

Please try to make it just show an error but not exit the app. Maybe raise an exception or something.