ggerganov / llama.cpp

LLM inference in C/C++
MIT License
67.33k stars 9.67k forks source link

Feature Request: support embedding stella_en_400M and stella_en_400M.gguf conversion #9202

Open raymond-infinitecode opened 2 months ago

raymond-infinitecode commented 2 months ago

Prerequisites

Feature Description

Need help supporting stella_en_400M, observed that we have embedding model https://ollama.com/Losspost/stella_en_1.5b_v5

but there I couldn't convert stella_en_400M myself

Model Download: https://hf.rst.im/dunzhang/stella_en_400M_v5

D:\llama.cpp>python convert_hf_to_gguf.py d:/llama.cpp/stella_en_400M_v5 --outfile stella_en_400M.gguf --outtype q8_0 INFO:hf-to-gguf:Loading model: stella_en_400M_v5 ERROR:hf-to-gguf:Model NewModel is not supported

Motivation

To have better embedding model

Possible Implementation

No response

0xDEADFED5 commented 2 months ago

related models, i believe:

https://huggingface.co/Alibaba-NLP/gte-large-en-v1.5 https://huggingface.co/Alibaba-NLP/gte-base-en-v1.5

anishjain123 commented 1 month ago

Please! These are the SoTA for performance:resource ratio! @ggerganov we want to be able to have robust local retrieval models!

sammcj commented 1 month ago

Has anyone had any luck creating a GGUF version of stella_en_400M_v5? I've had a go but wasn't successful.

raymond-infinitecode commented 1 month ago

stella_en_400M_v5 is derived from GTE based model which is not officially supported by llama.cpp / ollama, probably that's why no people manage to support that.

devrimcavusoglu commented 2 weeks ago

Same issue persists for Alibaba-NLP/gte-multilingual-base. Any updates on this ? @ggerganov