Closed deramos closed 5 months ago
@deramos, we've seen these types of errors with the newer MacOS Sonoma. Can I ask you to bump down your onnxruntime
dependency to 1.16.3
and let me know if that works?
Hi @tazarov. Thanks for your reply.
I am using a chromadb docker image. I built a custom image FROM chroma:0.4.25.dev93 so I could control the version of the onnxruntime, which is now 1.16.3 but the error still exists.
Do you know any image right off the bat with that version of onnxruntime? Is there another way to solve this problem for docker-based chromadb?
@deramos, while I think about how this can be solved, you can use sentence transformers with the same model as the default embedding function, which is https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2
sentence_transformer_ef = embedding_functions.SentenceTransformerEmbeddingFunction(model_name="all-MiniLM-L6-v2")
@tazarov alright, will do
@tazarov. Thanks
Generating the embedding before saving the documents works. I will be taking that route.
for news in news_articles:
collection.add(
documents=[news['processed_content']],
embeddings=[sentence_transformer_ef(news['processed_content'])],
metadatas=[{'entities': json.dumps(news['entities']), 'summary': news['news_summary'], 'source': news['source']}],
ids=[str(news['_id'])]
)
But please let me know if you find a solution to the issue. I am curious to know what caused it. 🙏
fwiw I ran into the onnx issue above on a mac (ventura 13.5) using the quick start/tutorials. mucked around with it a bit and got this working using the demo script:
import chromadb
from chromadb.utils import embedding_functions
# setup Chroma in-memory, for easy prototyping. Can add persistence easily!
client = chromadb.Client()
# Create collection. get_collection, get_or_create_collection, delete_collection also available!
sentence_transformer_ef = embedding_functions.SentenceTransformerEmbeddingFunction(model_name="all-MiniLM-L6-v2")
collection = client.create_collection("all-my-documents")
# Add docs to the collection. Can also update and delete. Row-based API coming soon!
documents = ["This is document1", "This is document2"]
metadatas = [{"source": "notion"}, {"source": "google-docs"}]
ids = ["doc1", "doc2"]
for doc, meta, id in zip(documents, metadatas, ids):
collection.add(
documents=[doc],
embeddings=[sentence_transformer_ef(doc)[0]],
metadatas=[meta],
ids=[id],
)
# Query/search 2 most similar results. You can also .get by id
results = collection.query(
query_texts=["This is a query document"],
n_results=2,
# where={"metadata_field": "is_equal_to_this"}, # optional filter
# where_document={"$contains":"search_string"} # optional filter
)
# Print the results
for key, value in results.items():
print(f"{key}: {value}")
What happened?
I was trying to store news articles and their summaries into chromadb when I encountered this issue
Nothing fancy was done in the code snippet.
System Spec
At first I thought it was because I didn't have Nvidia GPU, but I thought for a second, if it were GPU related, the error message would be different.
Please help 🙏
Versions
Relevant log output
No response