Open fusesid opened 2 months ago
If you have a normal FP16/BF16 model, this does not happen. I would suggest you check if the model can run inference with Huggingface libraries as a first step
@casper-hansen
Yeah, I am able to run inference with huggingface model. As can be seen on the screenshot.
Not sure, what is the issue with converting it into the AWQ format as i want to test AWQ with vLLM.
Important note to be considered is that i have used unsloth for finetuning utilizing LORA and save model using merge_and_unload() method of peftmodel.
I have finetuned the llama 3.1 using unsloth. Then, i merged and unloaded the LORA model and pushed to the hub.
Now when i tried quantizing it using:
But this is showing : RuntimeError: output with shape [8388608, 1] doesn't match the broadcast shape [8388608, 4096]
I am confused and not sure what the issue is. Can anyone please guide me?