SJTU-IPADS / PowerInfer

High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
MIT License
7.98k stars 415 forks source link

How can i convert llama-3 8b and 70b in GGUF model ? #197

Open tankvpython opened 5 months ago

tankvpython commented 5 months ago

i have trying with this but is was throw given error

error Loading model file /root/llama_models/Meta-Llama-3-8B/consolidated.00.pth params = Params(n_vocab=128256, n_embd=4096, n_layer=32, n_ctx=4096, n_ff=14336, n_head=32, n_head_kv=8, f_norm_eps=1e-05, arch=<MODEL_ARCH.LLAMA: 1>, rope_scaling_type=None, f_rope_freq_base=500000.0, f_rope_scale=None, n_orig_ctx=None, rope_finetuned=None, ftype=None, path_model=PosixPath('/root/llama_models/Meta-Llama-3-8B')) Loading vocab file '/root/llama_models/Meta-Llama-3-8B/tokenizer.model', type 'spm' Traceback (most recent call last): File "/root/PowerInfer/convert-dense.py", line 1218, in main() File "/root/PowerInfer/convert-dense.py", line 1198, in main vocab = load_vocab(vocab_dir, args.vocabtype) File "/root/PowerInfer/convert-dense.py", line 1097, in load_vocab return SentencePieceVocab(path, added_tokens_path if added_tokens_path.exists() else None) File "/root/PowerInfer/convert-dense.py", line 360, in init self.sentencepiece_tokenizer = SentencePieceProcessor(str(fname_tokenizer)) File "/root/PowerInfer/venv/lib/python3.10/site-packages/sentencepiece/init.py", line 468, in Init self.Load(model_file=model_file, model_proto=model_proto) File "/root/PowerInfer/venv/lib/python3.10/site-packages/sentencepiece/init.py", line 961, in Load return self.LoadFromFile(model_file) File "/root/PowerInfer/venv/lib/python3.10/site-packages/sentencepiece/init.py", line 316, in LoadFromFile return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg) RuntimeError: Internal: could not parse ModelProto from /root/llama_models/Meta-Llama-3-8B/tokenizer.model