Converting "embedding" models and running them on ctranslate2

BBC-Esq commented 1 year ago

I'm having extreme trouble figuring out how to use "embedding" models like "bge-large-en" or "instructor-xl" by (1) converting them to the ctranslate format; and (2) creating embeddings with them using ctranslate2. If someone could just tell me the proper "class" of ctranslate2 to use for each I'd greatly appreciate it - I can figure out the rest (i.e. methods and parameters) if someone could clue me into the proper classes to use...

guillaumekln commented 1 year ago

The model "instructor-xl" uses the "T5EncoderModel" architecture which is not yet supported by CTranslate2. So this model cannot be converted and executed by CTranslate2.

On the other hand, "bge-large-en" is a "BertModel" which is supported. The easiest way to run this model is probably with this custom SentenceTransformer wrapper:

https://gist.github.com/guillaumekln/fb125fc3eb108d1a304b7432486e712f

This wrapper automatically converts and runs the model with CTranslate2.

Simply replace the model name "sentence-transformers/LaBSE" to "BAAI/bge-large-en" and the example script should work directly.

BBC-Esq commented 1 year ago

Thanks! But any possible future support for T5? I though that it was supported based off of this?
is the "T5" referenced here not the same thing? I'm confused because that foregoing, and this as well, cite "T5..."

BBC-Esq commented 1 year ago

Maybe the T5 in the picture above refer to T5 regarding "inference" LLMs and not "embedding" models. Sorry if I'm showing my ignorance...I'm not a programmer by trade and this is a hobby.

ArtanisTheOne commented 1 year ago

Maybe the T5 in the picture above refer to T5 regarding "inference" LLMs and not "embedding" models

Yeah, I think T5 encoder-decoder models are supported but just the Encoder itself isn't yet (encoder only models aka embedding models are the same thing).

BBC-Esq commented 1 year ago

That helps me understand...

BBC-Esq commented 1 year ago

Here is the relevant portion of script I'm trying to redo to rely on ctranslate2. Currently, it chooses "huggingfaceinstruct" embeddings for models like "instructor-xl" and "huggingfaceembeddings" for other embedding models...if Ctranslate2 is everything I think it is, I'd like to use it for every kind of embedding model! `def main(): device_type = "cuda" # Change to 'cpu' if needed

logging.info(f"Loading documents from {SOURCE_DIRECTORY}")
documents = load_documents(SOURCE_DIRECTORY)
text_documents, _ = split_documents(documents)
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=500)
texts = text_splitter.split_documents(text_documents)

logging.info(f"Loaded {len(documents)} documents from {SOURCE_DIRECTORY}")
logging.info(f"Split into {len(texts)} chunks of text")

if "instructor" in EMBEDDING_MODEL_NAME:
    embeddings = HuggingFaceInstructEmbeddings(
        model_name=EMBEDDING_MODEL_NAME,
        model_kwargs={"device": device_type},
    )
else:
    embeddings = HuggingFaceEmbeddings(model_name=EMBEDDING_MODEL_NAME)

# Delete contents of the PERSIST_DIRECTORY before creating the vector database
if os.path.exists(PERSIST_DIRECTORY):
    shutil.rmtree(PERSIST_DIRECTORY)
    os.makedirs(PERSIST_DIRECTORY)

db = Chroma.from_documents(
    texts,
    embeddings,
    persist_directory=PERSIST_DIRECTORY,
    client_settings=CHROMA_SETTINGS,
)
db.persist()
db = None`

BBC-Esq commented 1 year ago

And then I have another script that facilitates the interaction between the vector database (created with the previous script) and the LLM I'm connecting to via "LocalHost." So basically, I want to use ctranslate2 for both aspects of the program, which, overall, is basically designed to query your documents: `def connect_to_local_chatgpt(prompt): formatted_prompt = f"{prefix}{prompt}{suffix}" response = openai.ChatCompletion.create( model="local model", temperature=0.1, messages=[{"role": "user", "content": formatted_prompt}] ) return response.choices[0].message["content"]

def ask_local_chatgpt(query, embed_model_name=EMBEDDING_MODEL_NAME, persist_directory=PERSIST_DIRECTORY, client_settings=CHROMA_SETTINGS):

if "instructor" in EMBEDDING_MODEL_NAME:
    embeddings = HuggingFaceInstructEmbeddings(
        model_name=EMBEDDING_MODEL_NAME,
        model_kwargs={"device": "cuda"},
        encode_kwargs={'normalize_embeddings': True}
    )
else:
    embeddings = HuggingFaceEmbeddings(
        model_name=EMBEDDING_MODEL_NAME,
        model_kwargs={'device': "cuda"},
        encode_kwargs={'normalize_embeddings': True}
    )

db = Chroma(
    persist_directory=persist_directory,
    embedding_function=embeddings,
    client_settings=client_settings,
)
retriever = db.as_retriever()
relevant_contexts = retriever.get_relevant_documents(query)
contexts = [document.page_content for document in relevant_contexts]
augmented_query = "\n\n---\n\n".join(contexts) + "\n\n-----\n\n" + query
response_json = connect_to_local_chatgpt(augmented_query)
return {"answer": response_json, "sources": relevant_contexts}

def interact_with_chat(user_input): global last_response response = ask_local_chatgpt(user_input) answer = response['answer'] last_response = answer return answer

def get_last_response(): global last_response return last_response `

aamir-s18 commented 1 year ago

One comment here, they are not using the plaine T5 model in the forward pass. See here.

BBC-Esq commented 1 year ago

The model "instructor-xl" uses the "T5EncoderModel" architecture which is not yet supported by CTranslate2. So this model cannot be converted and executed by CTranslate2.

On the other hand, "bge-large-en" is a "BertModel" which is supported. The easiest way to run this model is probably with this custom SentenceTransformer wrapper:

https://gist.github.com/guillaumekln/fb125fc3eb108d1a304b7432486e712f

This wrapper automatically converts and runs the model with CTranslate2.

Simply replace the model name "sentence-transformers/LaBSE" to "BAAI/bge-large-en" and the example script should work directly.

How difficult would it be to enable ctranslate2 to work with the instructor embedding models? They're hands down the best...Maybe it's something I could help with?

BBC-Esq commented 10 months ago

Closed for lack of interest.

OpenNMT / CTranslate2

Converting "embedding" models and running them on ctranslate2 #1448