[BUG/Help] <Help Need for quantize the model>

Is there an existing issue for this?

[X] I have searched the existing issues

Current Behavior

I want to run this on a R7-6800H laptop using CPU however I do not have a NVIDIA card to run the quantize python code. can someone provide me a int8 or 16 version of quantized model or give me some instructions on how to write one to let CPU to do the work? P.S Should the model works in LLaMa.cpp?

Expected Behavior

No response

Steps To Reproduce

have a laptop with out NVIDIA GPU and run the quantize python code.

Environment

- OS:windows 11  
- Python:3.10.9
- Transformers:according to requirements.txt
- PyTorch:requirements.txt
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :False

Anything else?