Closed BBC-Esq closed 10 months ago
The model "instructor-xl" uses the "T5EncoderModel" architecture which is not yet supported by CTranslate2. So this model cannot be converted and executed by CTranslate2.
On the other hand, "bge-large-en" is a "BertModel" which is supported. The easiest way to run this model is probably with this custom SentenceTransformer wrapper:
https://gist.github.com/guillaumekln/fb125fc3eb108d1a304b7432486e712f
This wrapper automatically converts and runs the model with CTranslate2.
Simply replace the model name "sentence-transformers/LaBSE"
to "BAAI/bge-large-en"
and the example script should work directly.
Thanks! But any possible future support for T5? I though that it was supported based off of this?
is the "T5" referenced here not the same thing?
I'm confused because that foregoing, and this as well, cite "T5..."
Maybe the T5 in the picture above refer to T5 regarding "inference" LLMs and not "embedding" models. Sorry if I'm showing my ignorance...I'm not a programmer by trade and this is a hobby.
Maybe the T5 in the picture above refer to T5 regarding "inference" LLMs and not "embedding" models
Yeah, I think T5 encoder-decoder models are supported but just the Encoder itself isn't yet (encoder only models aka embedding models are the same thing).
That helps me understand...
Here is the relevant portion of script I'm trying to redo to rely on ctranslate2. Currently, it chooses "huggingfaceinstruct" embeddings for models like "instructor-xl" and "huggingfaceembeddings" for other embedding models...if Ctranslate2 is everything I think it is, I'd like to use it for every kind of embedding model! `def main(): device_type = "cuda" # Change to 'cpu' if needed
logging.info(f"Loading documents from {SOURCE_DIRECTORY}")
documents = load_documents(SOURCE_DIRECTORY)
text_documents, _ = split_documents(documents)
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=500)
texts = text_splitter.split_documents(text_documents)
logging.info(f"Loaded {len(documents)} documents from {SOURCE_DIRECTORY}")
logging.info(f"Split into {len(texts)} chunks of text")
if "instructor" in EMBEDDING_MODEL_NAME:
embeddings = HuggingFaceInstructEmbeddings(
model_name=EMBEDDING_MODEL_NAME,
model_kwargs={"device": device_type},
)
else:
embeddings = HuggingFaceEmbeddings(model_name=EMBEDDING_MODEL_NAME)
# Delete contents of the PERSIST_DIRECTORY before creating the vector database
if os.path.exists(PERSIST_DIRECTORY):
shutil.rmtree(PERSIST_DIRECTORY)
os.makedirs(PERSIST_DIRECTORY)
db = Chroma.from_documents(
texts,
embeddings,
persist_directory=PERSIST_DIRECTORY,
client_settings=CHROMA_SETTINGS,
)
db.persist()
db = None`
And then I have another script that facilitates the interaction between the vector database (created with the previous script) and the LLM I'm connecting to via "LocalHost." So basically, I want to use ctranslate2 for both aspects of the program, which, overall, is basically designed to query your documents: `def connect_to_local_chatgpt(prompt): formatted_prompt = f"{prefix}{prompt}{suffix}" response = openai.ChatCompletion.create( model="local model", temperature=0.1, messages=[{"role": "user", "content": formatted_prompt}] ) return response.choices[0].message["content"]
def ask_local_chatgpt(query, embed_model_name=EMBEDDING_MODEL_NAME, persist_directory=PERSIST_DIRECTORY, client_settings=CHROMA_SETTINGS):
if "instructor" in EMBEDDING_MODEL_NAME:
embeddings = HuggingFaceInstructEmbeddings(
model_name=EMBEDDING_MODEL_NAME,
model_kwargs={"device": "cuda"},
encode_kwargs={'normalize_embeddings': True}
)
else:
embeddings = HuggingFaceEmbeddings(
model_name=EMBEDDING_MODEL_NAME,
model_kwargs={'device': "cuda"},
encode_kwargs={'normalize_embeddings': True}
)
db = Chroma(
persist_directory=persist_directory,
embedding_function=embeddings,
client_settings=client_settings,
)
retriever = db.as_retriever()
relevant_contexts = retriever.get_relevant_documents(query)
contexts = [document.page_content for document in relevant_contexts]
augmented_query = "\n\n---\n\n".join(contexts) + "\n\n-----\n\n" + query
response_json = connect_to_local_chatgpt(augmented_query)
return {"answer": response_json, "sources": relevant_contexts}
def interact_with_chat(user_input): global last_response response = ask_local_chatgpt(user_input) answer = response['answer'] last_response = answer return answer
def get_last_response(): global last_response return last_response `
One comment here, they are not using the plaine T5 model in the forward pass. See here.
The model "instructor-xl" uses the "T5EncoderModel" architecture which is not yet supported by CTranslate2. So this model cannot be converted and executed by CTranslate2.
On the other hand, "bge-large-en" is a "BertModel" which is supported. The easiest way to run this model is probably with this custom SentenceTransformer wrapper:
https://gist.github.com/guillaumekln/fb125fc3eb108d1a304b7432486e712f
This wrapper automatically converts and runs the model with CTranslate2.
Simply replace the model name
"sentence-transformers/LaBSE"
to"BAAI/bge-large-en"
and the example script should work directly.
How difficult would it be to enable ctranslate2 to work with the instructor embedding models? They're hands down the best...Maybe it's something I could help with?
Closed for lack of interest.
I'm having extreme trouble figuring out how to use "embedding" models like "bge-large-en" or "instructor-xl" by (1) converting them to the ctranslate format; and (2) creating embeddings with them using ctranslate2. If someone could just tell me the proper "class" of ctranslate2 to use for each I'd greatly appreciate it - I can figure out the rest (i.e. methods and parameters) if someone could clue me into the proper classes to use...