ggerganov / llama.cpp

LLM inference in C/C++
MIT License
62.21k stars 8.93k forks source link

convert.py still fails on llama3 8B-Instruct downloaded directly from Meta (Huggingface works) #7339

Open aleloi opened 2 months ago

aleloi commented 2 months ago

I downloaded the llama3 8B Instruct weights directly from the Meta repository (not Huggingface) https://llama.meta.com/llama-downloads. I then tried to run the convert script using the command suggestions that I found in the comments at https://github.com/ggerganov/llama.cpp/pull/6745 and https://github.com/ggerganov/llama.cpp/issues/6819.

tokenizer.model in the contains this. It's definitely not Protobuf, not sure whether it's bpe

IQ== 0
Ig== 1
Iw== 2
JA== 3
JQ== 4
Jg== 5
Jw== 6
KA== 7
KQ== 8
Kg== 9

I'm running llama.cpp at current master, which is commit 29c60d8cdd. I skimmed the discussion in https://github.com/ggerganov/llama.cpp/pull/6745 and https://github.com/ggerganov/llama.cpp/pull/6920 for a solution, couldn't find one and downloaded the Huggingface version of llama3 8B Instruct instead, which converted without issues. Here are a few of the commands that I tried to run:

python convert.py ../Meta-Llama-3-8B-Instruct/ --outfile /models/meta-llama/ggml-meta-llama-3-8b-f16.gguf  --outtype f16

INFO:convert:Loading model file ../Meta-Llama-3-8B-Instruct/consolidated.00.pth
INFO:convert:model parameters count : 8030261248 (8B)
INFO:convert:params = Params(n_vocab=128256, n_embd=4096, n_layer=32, n_ctx=4096, n_ff=14336, n_head=32, n_head_kv=8, n_experts=None, n_experts_used=None, f_norm_eps=1e-05, rope_scaling_type=None, f_rope_freq_base=500000.0, f_rope_scale=None, n_orig_ctx=None, rope_finetuned=None, ftype=<GGMLFileType.MostlyF16: 1>, path_model=PosixPath('../Meta-Llama-3-8B-Instruct'))
Traceback (most recent call last):
  File "/home/alex/test-run-llama-cpp/llama.cpp/convert.py", line 1714, in <module>
    main()
  File "/home/alex/test-run-llama-cpp/llama.cpp/convert.py", line 1671, in main
    vocab, special_vocab = vocab_factory.load_vocab(vocab_types, model_parent_path)
  File "/home/alex/test-run-llama-cpp/llama.cpp/convert.py", line 1522, in load_vocab
    vocab = self._create_vocab_by_path(vocab_types)
  File "/home/alex/test-run-llama-cpp/llama.cpp/convert.py", line 1507, in _create_vocab_by_path
    vocab = cls(self.path)
  File "/home/alex/test-run-llama-cpp/llama.cpp/convert.py", line 506, in __init__
    self.sentencepiece_tokenizer.LoadFromFile(str(fname_tokenizer))
  File "/home/alex/.pyenv/versions/llama.cpp/lib/python3.10/site-packages/sentencepiece/__init__.py", line 316, in LoadFromFile
    return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
RuntimeError: Internal: could not parse ModelProto from ../Meta-Llama-3-8B-Instruct/tokenizer.model
(llama.cpp) alex@ml-burken:~/test-run-llama-cpp/llama.cpp$ python convert.py ../Meta-Llama-3-8B-Instruct/ --outfile /models/meta-llama/ggml-meta-llama-3-8b-f16.gguf --vocab-type bpe --outtype f16
INFO:convert:Loading model file ../Meta-Llama-3-8B-Instruct/consolidated.00.pth
INFO:convert:model parameters count : 8030261248 (8B)
INFO:convert:params = Params(n_vocab=128256, n_embd=4096, n_layer=32, n_ctx=4096, n_ff=14336, n_head=32, n_head_kv=8, n_experts=None, n_experts_used=None, f_norm_eps=1e-05, rope_scaling_type=None, f_rope_freq_base=500000.0, f_rope_scale=None, n_orig_ctx=None, rope_finetuned=None, ftype=<GGMLFileType.MostlyF16: 1>, path_model=PosixPath('../Meta-Llama-3-8B-Instruct'))
Traceback (most recent call last):
  File "/home/alex/test-run-llama-cpp/llama.cpp/convert.py", line 1714, in <module>
    main()
  File "/home/alex/test-run-llama-cpp/llama.cpp/convert.py", line 1671, in main
    vocab, special_vocab = vocab_factory.load_vocab(vocab_types, model_parent_path)
  File "/home/alex/test-run-llama-cpp/llama.cpp/convert.py", line 1522, in load_vocab
    vocab = self._create_vocab_by_path(vocab_types)
  File "/home/alex/test-run-llama-cpp/llama.cpp/convert.py", line 1512, in _create_vocab_by_path
    raise FileNotFoundError(f"Could not find a tokenizer matching any of {vocab_types}")
FileNotFoundError: Could not find a tokenizer matching any of ['bpe']
giannisanni commented 2 months ago

I have a similar problem. I merged 2 llama3 8b models with mergekit and i now want to conver them to gguf.

This is the output i got:

(.venv) PS C:\Users\gsanr\PycharmProjects\llama.cpp> python convert.py penny-dolphin-einstean-llama 3 --outfile penny-dolphin-einstein-llama3.gguf --outtype f16 Loading model file penny-dolphin-einstean-llama3\model-00001-of-00004.safetensors Loading model file penny-dolphin-einstean-llama3\model-00001-of-00004.safetensors Loading model file penny-dolphin-einstean-llama3\model-00002-of-00004.safetensors Loading model file penny-dolphin-einstean-llama3\model-00003-of-00004.safetensors Loading model file penny-dolphin-einstean-llama3\model-00004-of-00004.safetensors params = Params(n_vocab=128258, n_embd=4096, n_layer=32, n_ctx=8192, n_ff=14336, n_head=32, nhead kv=8, n_experts=None, n_experts_used=None, f_norm_eps=1e-05, rope_scaling_type=None, f_rope_freq_ba se=500000.0, f_rope_scale=None, n_orig_ctx=None, rope_finetuned=None, ftype=<GGMLFileType.MostlyF16 : 1>, path_model=WindowsPath('penny-dolphin-einstean-llama3')) Traceback (most recent call last): File "C:\Users\gsanr\PycharmProjects\llama.cpp\convert.py", line 1555, in main() File "C:\Users\gsanr\PycharmProjects\llama.cpp\convert.py", line 1522, in main vocab, special_vocab = vocab_factory.load_vocab(vocab_types, model_parent_path) File "C:\Users\gsanr\PycharmProjects\llama.cpp\convert.py", line 1424, in load_vocab vocab = self._create_vocab_by_path(vocab_types) File "C:\Users\gsanr\PycharmProjects\llama.cpp\convert.py", line 1409, in _create_vocab_by_path
vocab = cls(self.path) File "C:\Users\gsanr\PycharmProjects\llama.cpp\convert.py", line 533, in init raise TypeError('Llama 3 must be converted with BpeVocab') TypeError: Llama 3 must be converted with BpeVocab

sdmorrey commented 2 months ago

Could it be related to this issue? https://github.com/ggerganov/llama.cpp/issues/7289

jukofyork commented 2 months ago

Have you tried using convert-hf-to-gguf.py instead?

aleloi commented 2 months ago

convert-hf-to-gguf.py expects a config.json file in the model folder. The hf version has one that looks like this:

{
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 128000,
  "eos_token_id": 128009,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 14336,
  "max_position_embeddings": 8192,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 8,
  "pretraining_tp": 1,
  "rms_norm_eps": 1e-05,
  "rope_scaling": null,
  "rope_theta": 500000.0,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.40.0.dev0",
  "use_cache": true,
  "vocab_size": 128256
}

The Meta version doesn't have one, but has a params.json that looks like this and seems to specify similar params. It doesn't list "architectures" though, which is a required key for the convert-hf script:

{
   "dim": 4096,
    "n_layers": 32,
    "n_heads": 32,
    "n_kv_heads": 8,
    "vocab_size": 128256,
    "multiple_of": 1024,
    "ffn_dim_multiplier": 1.3,
    "norm_eps": 1e-05,
    "rope_theta": 500000.0
}
(llama.cpp) alex@ml-burken:~/test-run-llama-cpp/llama.cpp$ python convert-hf-to-gguf.py  ../Meta-Llama-3-8B-Instruct --outfile  ../llama-3-8b-instruct-converted.bin
INFO:hf-to-gguf:Loading model: Meta-Llama-3-8B-Instruct
Traceback (most recent call last):
  File "/home/alex/test-run-llama-cpp/llama.cpp/convert-hf-to-gguf.py", line 2546, in <module>
    main()
  File "/home/alex/test-run-llama-cpp/llama.cpp/convert-hf-to-gguf.py", line 2521, in main
    hparams = Model.load_hparams(dir_model)
  File "/home/alex/test-run-llama-cpp/llama.cpp/convert-hf-to-gguf.py", line 351, in load_hparams
    with open(dir_model / "config.json", "r", encoding="utf-8") as f:
FileNotFoundError: [Errno 2] No such file or directory: '../Meta-Llama-3-8B-Instruct/config.json'
teleprint-me commented 2 months ago

Llama 3 uses the gpt-2 vocab and tiktoken encoder and decoder. The conversion scripts only implemented support for the HF releases.

I'm working on streamlining this entire process because converting has become cumbersome and would like a more fluid experience.

If I can get the initial stuff ironed out (it's proving challenging), then I'll see if I can get it in there if I have enough time.

If not, hopefully have it setup so someone else can easily plug it in and just play it.

For now, it's just best to use the hf to gguf script as the official release isn't currently supported due the complicated nature of how BPE is implemented.

Also, it looks like it will be moved to examples to reduce confusion since the majority of users are using huggingface. Not sure what the future for convert.py is, but it looks like it will still be kept around which I appreciate.

differentprogramming commented 1 month ago

Llama 3 uses the gpt-2 vocab and tiktoken encoder and decoder. The conversion scripts only implemented support for the HF releases.

I'm working on streamlining this entire process because converting has become cumbersome and would like a more fluid experience.

If I can get the initial stuff ironed out (it's proving challenging), then I'll see if I can get it in there if I have enough time.

If not, hopefully have it setup so someone else can easily plug it in and just play it.

For now, it's just best to use the hf to gguf script as the official release isn't currently supported due the complicated nature of how BPE is implemented.

Also, it looks like it will be moved to examples to reduce confusion since the majority of users are using huggingface. Not sure what the future for convert.py is, but it looks like it will still be kept around which I appreciate.

I spent 30 hours downloading the Meta versions. I can't use them? If I get a "config.json" from HF will that work with a model from Meta?

teleprint-me commented 1 month ago

I have no idea if this will work, but this is what I would try: