ggerganov / llama.cpp

LLM inference in C/C++
MIT License
61.3k stars 8.76k forks source link

Problem with converting Mistral-7B-v0.3 and Mistral-7B-Instruct-v0.3 to GGUF #7486

Open MoonRide303 opened 1 month ago

MoonRide303 commented 1 month ago

I was trying to convert recently released Mistral v0.3 models (Mistral-7B-v0.3, Mistral-7B-Instruct-v0.3) using python convert-hf-to-gguf.py --outtype f16 ..\Mistral-7B-Instruct-v0.3\ --outfile Mistral-7B-Instruct-v0.3-F16.gguf command from llama.cpp b2972, but in both cases I end up with:

INFO:hf-to-gguf:gguf: loading model part 'model-00001-of-00003.safetensors'
INFO:hf-to-gguf:token_embd.weight,           torch.bfloat16 --> F16, shape = {4096, 32768}
Traceback (most recent call last):
  File "D:\repos-git\llama.cpp\convert-hf-to-gguf.py", line 2585, in <module>
    main()
  File "D:\repos-git\llama.cpp\convert-hf-to-gguf.py", line 2579, in main
    model_instance.write()
  File "D:\repos-git\llama.cpp\convert-hf-to-gguf.py", line 328, in write
    self.write_tensors()
  File "D:\repos-git\llama.cpp\convert-hf-to-gguf.py", line 1336, in write_tensors
    super().write_tensors()
  File "D:\repos-git\llama.cpp\convert-hf-to-gguf.py", line 325, in write_tensors
    self.gguf_writer.add_tensor(new_name, data, raw_dtype=data_qtype)
  File "D:\repos-git\llama.cpp\gguf-py\gguf\gguf_writer.py", line 257, in add_tensor
    self.add_tensor_info(name, shape, tensor.dtype, tensor.nbytes, raw_dtype = raw_dtype)
  File "D:\repos-git\llama.cpp\gguf-py\gguf\gguf_writer.py", line 206, in add_tensor_info
    raise ValueError(f'Duplicated tensor name {name}')
ValueError: Duplicated tensor name token_embd.weight

Mistral v0.2 conversion works fine. Running Q6_K quant (provided by @MaziyarPanahi) from Mistral-7B-Instruct-v0.3-GGUF repo works fine, too.

arch-btw commented 1 month ago

I'm also having this issue with the v3 tokenizer

arch-btw commented 1 month ago

@MoonRide303 I solved it by using consolidated.safetensors It's in the mistral repo. Make sure to delete the other safetensors first.

thesven commented 1 month ago

Thanks @arch-btw that seemed to do the trick

flatsiedatsie commented 1 month ago

I just saw the same error when trying to convert a GPT2LMHeadModel (striki-ai/william-shakespeare-poetry).

MoonRide303 commented 1 month ago

@MoonRide303 I solved it by using consolidated.safetensors It's in the mistral repo. Make sure to delete the other safetensors first.

Yeah, having both consolidated and separated model files in the same folder causes the problem - I've moved consolidated.safetensors to different folder, and conversion started working, then, too.

But it looks like it should be corrected on llama.cpp side, too - model.safetensors.index.json file (which defines which model files should be used) is being ignored.