ggerganov / llama.cpp

LLM inference in C/C++
MIT License
66.86k stars 9.6k forks source link

Bug: TypeError when YAML license field in README.md is a list during GGUF conversion #9819

Open gakugaku opened 2 weeks ago

gakugaku commented 2 weeks ago

What happened?

When converting models using convert_hf_to_gguf.py to GGUF format, a TypeError occurs if the license field in the README.md YAML header is a string list.

Reproduction

Convert a model that has multiple licenses, such as tokyotech-llm/Llama-3.1-Swallow-70B-Instruct-v0.1.

Cause of Error

the license field is defined as a str in metadata.py,

https://github.com/ggerganov/llama.cpp/blob/c81f3bbb051f8b736e117dfc78c99d7c4e0450f6/gguf-py/gguf/metadata.py#L38

During processing, the script attempts to decode it as a string, but since it's actually a list[str], a TypeError occurs.

Workaround

As a temporary workaround, I modified the GGUFWriter.add_license method in gguf_writer.py:

https://github.com/ggerganov/llama.cpp/blob/c81f3bbb051f8b736e117dfc78c99d7c4e0450f6/gguf-py/gguf/gguf_writer.py#L523-L524

I forced the list to be converted into a string:

def add_license(self, license: str) -> None:
    if isinstance(license, list):
        license = ", ".join(license)

    self.add_string(Keys.General.LICENSE, license)

Name and Version

$ ./llama-cli --version version: 3441 (081fe431) built with cc (Ubuntu 13.2.0-23ubuntu4) 13.2.0 for x86_64-linux-gnu

$ git rev-parse HEAD 841713e1e487bdb82fd106a52ad998c5f87b59e9

What operating system are you seeing the problem on?

Linux

Relevant log output

Details

```shell INFO:hf-to-gguf:Loading model: Llama-3.1-Swallow-70B-Instruct-v0.1 INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only INFO:hf-to-gguf:Exporting model... INFO:hf-to-gguf:rope_freqs.weight, torch.float32 --> F32, shape = {64} INFO:hf-to-gguf:gguf: loading model weight map from 'model.safetensors.index.json' INFO:hf-to-gguf:gguf: loading model part 'model-00001-of-00030.safetensors' INFO:hf-to-gguf:token_embd.weight, torch.bfloat16 --> BF16, shape = {8192, 128256} INFO:hf-to-gguf:blk.0.attn_norm.weight, torch.bfloat16 --> F32, shape = {8192} INFO:hf-to-gguf:blk.0.ffn_down.weight, torch.bfloat16 --> BF16, shape = {28672, 8192} INFO:hf-to-gguf:blk.0.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {8192, 28672} INFO:hf-to-gguf:blk.0.ffn_up.weight, torch.bfloat16 --> BF16, shape = {8192, 28672} INFO:hf-to-gguf:blk.0.ffn_norm.weight, torch.bfloat16 --> F32, shape = {8192} INFO:hf-to-gguf:blk.0.attn_k.weight, torch.bfloat16 --> BF16, shape = {8192, 1024} ... ... INFO:hf-to-gguf:blk.79.ffn_norm.weight, torch.bfloat16 --> F32, shape = {8192} INFO:hf-to-gguf:blk.79.attn_k.weight, torch.bfloat16 --> BF16, shape = {8192, 1024} INFO:hf-to-gguf:blk.79.attn_output.weight, torch.bfloat16 --> BF16, shape = {8192, 8192} INFO:hf-to-gguf:blk.79.attn_q.weight, torch.bfloat16 --> BF16, shape = {8192, 8192} INFO:hf-to-gguf:blk.79.attn_v.weight, torch.bfloat16 --> BF16, shape = {8192, 1024} INFO:hf-to-gguf:output_norm.weight, torch.bfloat16 --> F32, shape = {8192} INFO:hf-to-gguf:gguf: loading model part 'model-00030-of-00030.safetensors' INFO:hf-to-gguf:output.weight, torch.bfloat16 --> BF16, shape = {8192, 128256} INFO:hf-to-gguf:Set meta model INFO:hf-to-gguf:Set model parameters INFO:hf-to-gguf:gguf: context length = 8192 INFO:hf-to-gguf:gguf: embedding length = 8192 INFO:hf-to-gguf:gguf: feed forward length = 28672 INFO:hf-to-gguf:gguf: head count = 64 INFO:hf-to-gguf:gguf: key-value head count = 8 INFO:hf-to-gguf:gguf: rope theta = 500000.0 INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-05 INFO:hf-to-gguf:gguf: file type = 32 INFO:hf-to-gguf:Set model tokenizer INFO:gguf.vocab:Adding 280147 merge(s). INFO:gguf.vocab:Setting special token type bos to 128000 INFO:gguf.vocab:Setting special token type eos to 128009 INFO:gguf.vocab:Setting special token type pad to 128004 INFO:gguf.vocab:Setting chat_template to {% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|> '+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|> ' }}{% endif %} INFO:hf-to-gguf:Set model quantization version INFO:gguf.gguf_writer:Writing the following files: INFO:gguf.gguf_writer:/model/Llama-3.1-Swallow-70B-Instruct-v0.1-BF16.gguf: n_tensors = 724, total_size = 141.1G model_card {'language': ['en', 'ja'], 'library_name': 'transformers', 'pipeline_tag': 'text-generation', 'license': ['llama3.1', 'gemma'], 'model_type': 'llama', 'datasets': ['lmsys/lmsys-chat-1m', 'argilla/magpie-ultra-v0.1']} hf_params {'_name_or_path': '/bb/llm/gaf51275/hf-checkpoints/Meta-Llama-3.1-70B-Instruct', 'architectures': ['LlamaForCausalLM'], 'attention_bias': False, 'attention_dropout': 0.0, 'bos_token_id': 128000, 'eos_token_id': [128001, 128008, 128009], 'hidden_act': 'silu', 'hidden_size': 8192, 'initializer_range': 0.02, 'intermediate_size': 28672, 'max_position_embeddings': 8192, 'mlp_bias': False, 'model_type': 'llama', 'num_attention_heads': 64, 'num_hidden_layers': 80, 'num_key_value_heads': 8, 'pretraining_tp': 1, 'rms_norm_eps': 1e-05, 'rope_scaling': {'factor': 8.0, 'high_freq_factor': 4.0, 'low_freq_factor': 1.0, 'original_max_position_embeddings': 8192, 'rope_type': 'llama3'}, 'rope_theta': 500000.0, 'tie_word_embeddings': False, 'torch_dtype': 'bfloat16', 'transformers_version': '4.44.2', 'use_cache': True, 'vocab_size': 128256} metadata heuristics Metadata(name='Llama 3.1 Swallow 70B Instruct v0.1', author=None, version='v0.1', organization=None, finetune='Instruct', basename='llama', description=None, quantized_by=None, size_label='70B', url=None, doi=None, uuid=None, repo_url=None, source_url=None, source_doi=None, source_uuid=None, source_repo_url=None, license=['llama3.1', 'gemma'], license_name=None, license_link=None, base_models=None, tags=['text-generation'], languages=['en', 'ja'], datasets=['lmsys/lmsys-chat-1m', 'argilla/magpie-ultra-v0.1']) Traceback (most recent call last): File "/repo/llama.cpp/convert_hf_to_gguf.py", line 4688, in main() File "/repo/llama.cpp/convert_hf_to_gguf.py", line 4682, in main model_instance.write() File "/repo/llama.cpp/convert_hf_to_gguf.py", line 517, in write self.gguf_writer.write_kv_data_to_file() File "/repo/llama.cpp/gguf-py/gguf/gguf_writer.py", line 253, in write_kv_data_to_file kv_bytes += self._pack_val(val.value, val.type, add_vtype=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/repo/llama.cpp/gguf-py/gguf/gguf_writer.py", line 904, in _pack_val kv_data += encoded_val TypeError: can't concat list to bytearray ```

gakugaku commented 6 hours ago

Related PR: #9807