OpenNMT / CTranslate2

Fast inference engine for Transformer models
https://opennmt.net/CTranslate2
MIT License
3.33k stars 289 forks source link

Converting "embedding" models and running them on ctranslate2 #1448

Closed BBC-Esq closed 10 months ago

BBC-Esq commented 1 year ago

I'm having extreme trouble figuring out how to use "embedding" models like "bge-large-en" or "instructor-xl" by (1) converting them to the ctranslate format; and (2) creating embeddings with them using ctranslate2. If someone could just tell me the proper "class" of ctranslate2 to use for each I'd greatly appreciate it - I can figure out the rest (i.e. methods and parameters) if someone could clue me into the proper classes to use...

guillaumekln commented 1 year ago

The model "instructor-xl" uses the "T5EncoderModel" architecture which is not yet supported by CTranslate2. So this model cannot be converted and executed by CTranslate2.

On the other hand, "bge-large-en" is a "BertModel" which is supported. The easiest way to run this model is probably with this custom SentenceTransformer wrapper:

https://gist.github.com/guillaumekln/fb125fc3eb108d1a304b7432486e712f

This wrapper automatically converts and runs the model with CTranslate2.

Simply replace the model name "sentence-transformers/LaBSE" to "BAAI/bge-large-en" and the example script should work directly.

BBC-Esq commented 1 year ago

Thanks! But any possible future support for T5? I though that it was supported based off of this?
image is the "T5" referenced here not the same thing? I'm confused because that foregoing, and this as well, cite "T5..."
image

BBC-Esq commented 1 year ago

Maybe the T5 in the picture above refer to T5 regarding "inference" LLMs and not "embedding" models. Sorry if I'm showing my ignorance...I'm not a programmer by trade and this is a hobby.

ArtanisTheOne commented 1 year ago

Maybe the T5 in the picture above refer to T5 regarding "inference" LLMs and not "embedding" models

Yeah, I think T5 encoder-decoder models are supported but just the Encoder itself isn't yet (encoder only models aka embedding models are the same thing).

BBC-Esq commented 1 year ago

That helps me understand...

BBC-Esq commented 1 year ago

Here is the relevant portion of script I'm trying to redo to rely on ctranslate2. Currently, it chooses "huggingfaceinstruct" embeddings for models like "instructor-xl" and "huggingfaceembeddings" for other embedding models...if Ctranslate2 is everything I think it is, I'd like to use it for every kind of embedding model! `def main(): device_type = "cuda" # Change to 'cpu' if needed

logging.info(f"Loading documents from {SOURCE_DIRECTORY}")
documents = load_documents(SOURCE_DIRECTORY)
text_documents, _ = split_documents(documents)
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=500)
texts = text_splitter.split_documents(text_documents)

logging.info(f"Loaded {len(documents)} documents from {SOURCE_DIRECTORY}")
logging.info(f"Split into {len(texts)} chunks of text")

if "instructor" in EMBEDDING_MODEL_NAME:
    embeddings = HuggingFaceInstructEmbeddings(
        model_name=EMBEDDING_MODEL_NAME,
        model_kwargs={"device": device_type},
    )
else:
    embeddings = HuggingFaceEmbeddings(model_name=EMBEDDING_MODEL_NAME)

# Delete contents of the PERSIST_DIRECTORY before creating the vector database
if os.path.exists(PERSIST_DIRECTORY):
    shutil.rmtree(PERSIST_DIRECTORY)
    os.makedirs(PERSIST_DIRECTORY)

db = Chroma.from_documents(
    texts,
    embeddings,
    persist_directory=PERSIST_DIRECTORY,
    client_settings=CHROMA_SETTINGS,
)
db.persist()
db = None`
BBC-Esq commented 1 year ago

And then I have another script that facilitates the interaction between the vector database (created with the previous script) and the LLM I'm connecting to via "LocalHost." So basically, I want to use ctranslate2 for both aspects of the program, which, overall, is basically designed to query your documents: `def connect_to_local_chatgpt(prompt): formatted_prompt = f"{prefix}{prompt}{suffix}" response = openai.ChatCompletion.create( model="local model", temperature=0.1, messages=[{"role": "user", "content": formatted_prompt}] ) return response.choices[0].message["content"]

def ask_local_chatgpt(query, embed_model_name=EMBEDDING_MODEL_NAME, persist_directory=PERSIST_DIRECTORY, client_settings=CHROMA_SETTINGS):

if "instructor" in EMBEDDING_MODEL_NAME:
    embeddings = HuggingFaceInstructEmbeddings(
        model_name=EMBEDDING_MODEL_NAME,
        model_kwargs={"device": "cuda"},
        encode_kwargs={'normalize_embeddings': True}
    )
else:
    embeddings = HuggingFaceEmbeddings(
        model_name=EMBEDDING_MODEL_NAME,
        model_kwargs={'device': "cuda"},
        encode_kwargs={'normalize_embeddings': True}
    )

db = Chroma(
    persist_directory=persist_directory,
    embedding_function=embeddings,
    client_settings=client_settings,
)
retriever = db.as_retriever()
relevant_contexts = retriever.get_relevant_documents(query)
contexts = [document.page_content for document in relevant_contexts]
augmented_query = "\n\n---\n\n".join(contexts) + "\n\n-----\n\n" + query
response_json = connect_to_local_chatgpt(augmented_query)
return {"answer": response_json, "sources": relevant_contexts}

def interact_with_chat(user_input): global last_response response = ask_local_chatgpt(user_input) answer = response['answer'] last_response = answer return answer

def get_last_response(): global last_response return last_response `

aamir-s18 commented 1 year ago

One comment here, they are not using the plaine T5 model in the forward pass. See here.

BBC-Esq commented 1 year ago

The model "instructor-xl" uses the "T5EncoderModel" architecture which is not yet supported by CTranslate2. So this model cannot be converted and executed by CTranslate2.

On the other hand, "bge-large-en" is a "BertModel" which is supported. The easiest way to run this model is probably with this custom SentenceTransformer wrapper:

https://gist.github.com/guillaumekln/fb125fc3eb108d1a304b7432486e712f

This wrapper automatically converts and runs the model with CTranslate2.

Simply replace the model name "sentence-transformers/LaBSE" to "BAAI/bge-large-en" and the example script should work directly.

How difficult would it be to enable ctranslate2 to work with the instructor embedding models? They're hands down the best...Maybe it's something I could help with?

BBC-Esq commented 10 months ago

Closed for lack of interest.