Open aleloi opened 2 months ago
I have a similar problem. I merged 2 llama3 8b models with mergekit and i now want to conver them to gguf.
This is the output i got:
(.venv) PS C:\Users\gsanr\PycharmProjects\llama.cpp> python convert.py penny-dolphin-einstean-llama
3 --outfile penny-dolphin-einstein-llama3.gguf --outtype f16
Loading model file penny-dolphin-einstean-llama3\model-00001-of-00004.safetensors
Loading model file penny-dolphin-einstean-llama3\model-00001-of-00004.safetensors
Loading model file penny-dolphin-einstean-llama3\model-00002-of-00004.safetensors
Loading model file penny-dolphin-einstean-llama3\model-00003-of-00004.safetensors
Loading model file penny-dolphin-einstean-llama3\model-00004-of-00004.safetensors
params = Params(n_vocab=128258, n_embd=4096, n_layer=32, n_ctx=8192, n_ff=14336, n_head=32, nhead
kv=8, n_experts=None, n_experts_used=None, f_norm_eps=1e-05, rope_scaling_type=None, f_rope_freq_ba
se=500000.0, f_rope_scale=None, n_orig_ctx=None, rope_finetuned=None, ftype=<GGMLFileType.MostlyF16
: 1>, path_model=WindowsPath('penny-dolphin-einstean-llama3'))
Traceback (most recent call last):
File "C:\Users\gsanr\PycharmProjects\llama.cpp\convert.py", line 1555, in
vocab = cls(self.path)
File "C:\Users\gsanr\PycharmProjects\llama.cpp\convert.py", line 533, in init
raise TypeError('Llama 3 must be converted with BpeVocab')
TypeError: Llama 3 must be converted with BpeVocab
Could it be related to this issue? https://github.com/ggerganov/llama.cpp/issues/7289
Have you tried using convert-hf-to-gguf.py
instead?
convert-hf-to-gguf.py
expects a config.json file in the model folder. The hf version has one that looks like this:
{
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 128000,
"eos_token_id": 128009,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 8192,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 500000.0,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.40.0.dev0",
"use_cache": true,
"vocab_size": 128256
}
The Meta version doesn't have one, but has a params.json that looks like this and seems to specify similar params. It doesn't list "architectures" though, which is a required key for the convert-hf script:
{
"dim": 4096,
"n_layers": 32,
"n_heads": 32,
"n_kv_heads": 8,
"vocab_size": 128256,
"multiple_of": 1024,
"ffn_dim_multiplier": 1.3,
"norm_eps": 1e-05,
"rope_theta": 500000.0
}
(llama.cpp) alex@ml-burken:~/test-run-llama-cpp/llama.cpp$ python convert-hf-to-gguf.py ../Meta-Llama-3-8B-Instruct --outfile ../llama-3-8b-instruct-converted.bin
INFO:hf-to-gguf:Loading model: Meta-Llama-3-8B-Instruct
Traceback (most recent call last):
File "/home/alex/test-run-llama-cpp/llama.cpp/convert-hf-to-gguf.py", line 2546, in <module>
main()
File "/home/alex/test-run-llama-cpp/llama.cpp/convert-hf-to-gguf.py", line 2521, in main
hparams = Model.load_hparams(dir_model)
File "/home/alex/test-run-llama-cpp/llama.cpp/convert-hf-to-gguf.py", line 351, in load_hparams
with open(dir_model / "config.json", "r", encoding="utf-8") as f:
FileNotFoundError: [Errno 2] No such file or directory: '../Meta-Llama-3-8B-Instruct/config.json'
Llama 3 uses the gpt-2 vocab and tiktoken encoder and decoder. The conversion scripts only implemented support for the HF releases.
I'm working on streamlining this entire process because converting has become cumbersome and would like a more fluid experience.
If I can get the initial stuff ironed out (it's proving challenging), then I'll see if I can get it in there if I have enough time.
If not, hopefully have it setup so someone else can easily plug it in and just play it.
For now, it's just best to use the hf to gguf script as the official release isn't currently supported due the complicated nature of how BPE is implemented.
Also, it looks like it will be moved to examples to reduce confusion since the majority of users are using huggingface. Not sure what the future for convert.py is, but it looks like it will still be kept around which I appreciate.
Llama 3 uses the gpt-2 vocab and tiktoken encoder and decoder. The conversion scripts only implemented support for the HF releases.
I'm working on streamlining this entire process because converting has become cumbersome and would like a more fluid experience.
If I can get the initial stuff ironed out (it's proving challenging), then I'll see if I can get it in there if I have enough time.
If not, hopefully have it setup so someone else can easily plug it in and just play it.
For now, it's just best to use the hf to gguf script as the official release isn't currently supported due the complicated nature of how BPE is implemented.
Also, it looks like it will be moved to examples to reduce confusion since the majority of users are using huggingface. Not sure what the future for convert.py is, but it looks like it will still be kept around which I appreciate.
I spent 30 hours downloading the Meta versions. I can't use them? If I get a "config.json" from HF will that work with a model from Meta?
I have no idea if this will work, but this is what I would try:
config.json
, tokenizer.json
, and tokenizer_config.json
files.model-00001-of-00004
.bin
or safetensors
, so the file extension would need to be bin
since they're zip files.
I downloaded the llama3 8B Instruct weights directly from the Meta repository (not Huggingface) https://llama.meta.com/llama-downloads. I then tried to run the convert script using the command suggestions that I found in the comments at https://github.com/ggerganov/llama.cpp/pull/6745 and https://github.com/ggerganov/llama.cpp/issues/6819.
tokenizer.model in the contains this. It's definitely not Protobuf, not sure whether it's bpe
I'm running llama.cpp at current master, which is commit 29c60d8cdd. I skimmed the discussion in https://github.com/ggerganov/llama.cpp/pull/6745 and https://github.com/ggerganov/llama.cpp/pull/6920 for a solution, couldn't find one and downloaded the Huggingface version of llama3 8B Instruct instead, which converted without issues. Here are a few of the commands that I tried to run: