ggerganov / llama.cpp

LLM inference in C/C++
MIT License
67.6k stars 9.71k forks source link

Feature Request: Support Jina V3 arch #9585

Closed abhishekbhakat closed 4 days ago

abhishekbhakat commented 1 month ago

Prerequisites

Feature Description

I was trying to convert https://huggingface.co/jinaai/jina-embeddings-v3 to GGUF and it seems like it doesn't support it yet:

INFO:hf-to-gguf:Loading model: jina-embeddings-v3
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:gguf: loading model part 'model.safetensors'
Traceback (most recent call last):
  File "/Volumes/AI/llama.cpp/convert_hf_to_gguf.py", line 4330, in <module>
    main()
  File "/Volumes/AI/llama.cpp/convert_hf_to_gguf.py", line 4324, in main
    model_instance.write()
  File "/Volumes/AI/llama.cpp/convert_hf_to_gguf.py", line 425, in write
    self.prepare_tensors()
  File "/Volumes/AI/llama.cpp/convert_hf_to_gguf.py", line 294, in prepare_tensors
    for new_name, data in ((n, d.squeeze().numpy()) for n, d in self.modify_tensors(data_torch, name, bid)):
  File "/Volumes/AI/llama.cpp/convert_hf_to_gguf.py", line 2704, in modify_tensors
    return super().modify_tensors(data_torch, name, bid)
  File "/Volumes/AI/llama.cpp/convert_hf_to_gguf.py", line 2568, in modify_tensors
    return [(self.map_tensor_name(name), data_torch)]
  File "/Volumes/AI/llama.cpp/convert_hf_to_gguf.py", line 214, in map_tensor_name
    raise ValueError(f"Can not map tensor {name!r}")
ValueError: Can not map tensor 'roberta.emb_ln.bias'

Motivation

Jina V3 has been one of the top performing embedding model. And it might be expected to see more models in the future.

Possible Implementation

No response

abhishekbhakat commented 1 month ago

6826 added the support for Jina V2. Perhaps something similar is needed.

enthermo commented 1 month ago

Jina V2 is based off of 'JinaBERT', V3 is based off of Jina-XLM-RoBERTa

abhishekbhakat commented 1 month ago

Okay, so the script already supports XLMRobertaModel. But the Flash implementation has different layer names or structures compared to the standard XLMRobertaModel, I believe.

ggerganov commented 1 month ago

Maybe a fix similar to the one in #9510 would be needed:


    def modify_tensors(self, data_torch: Tensor, name: str, bid: int | None) -> Iterable[tuple[str, Tensor]]:
        # if name starts with "roberta.", remove the prefix
        # e.g. https://huggingface.co/BAAI/bge-reranker-v2-m3/tree/main
        if name.startswith("roberta."):
            name = name[8:]
abhishekbhakat commented 1 month ago

That might mitigate the immediate error, but later it will fall into another.

For example, XLMRobertaModel looks for sentencepiece.bpe.model in the hf repo. But this repo only has a tokenizer.json.

I made a very naive attempt of a whole lot of re-writing set_vocab() but in the end got stuck in GGUFWriter getting a boolean in the array expecting <GGUFValueType.STRING: 8> type 🥲.

github-actions[bot] commented 4 days ago

This issue was closed because it has been inactive for 14 days since being marked as stale.