Open YueZhan721 opened 1 month ago
Hello @YueZhan721!
Could you run this model on a more powerful computer? How much RAM does your Raspberry Pi have?
Hello @b4rtaz I can run it on the google colab. The Raspberry Pi I used is 5 with 8GB RAM, the .m weight file is about 2.3 GB. I'm wondering if I'm doing something wrong with the quantization or formatting process.
By the way, I found that many llama3 models have no tokenizer.model file, how can I convert it from .safetensors to .m. Thanks for your reply very much.
Llama 2 7B should be approximately 3.95GB after quantization to Q40. If it's only 2.3 GB, something might be wrong.
Could you run any model on your Raspberry Pi (for example python launch.py tinyllama_1_1b_3t_q40
)?
how can I convert it from .safetensors to .m. Thanks for your reply very much.
Check convert-tokenizer-hf.py. It can convert some HF models.
Llama 2 7B should be approximately 3.95GB after quantization to Q40. If it's only 2.3 GB, something might be wrong.
Could you run any model on your Raspberry Pi (for example
python launch.py tinyllama_1_1b_3t_q40
)?
Yes, I can run those models(.m file) you provided. There may be something wrong when I save and convert the weight files. Thanks sincerely! Good work, and help to me.
Hello, thank you very much for your work. I am having a problem as shown in the picture. My steps are 1. fine-tune the llama2 initial model with unsloth, save the weights in .safetensors format; 2. use the convert-hf.py script to convert to .m format and .t format; 3. run it on a single Raspberry Pi. Hope hear from you, thanks again.