Open CeeArEx opened 10 months ago
Can you share the code for your embedding function? Likely the signature is just wrong - we can help debug.
I just put a request inside it. Because i can send the text to my server and it will return the embedding vectors.
It like this:
url = 'localhost:1234' myobj = {'text': input}
x = requests.post(url, json = myobj)
input is my text and i just want to return my x (my numbers).
I'm relatively new to this field, sorry for that.
No worries, Could you share the python code you are extending EmbeddingFunction with?
you loss the (), it's should be: ` collection = client.create_collection(name="testing", embedding_function=EmbeddingFunction())
`
you loss the (), it's should be:
`
collection = client.create_collection(name="testing", embedding_function=EmbeddingFunction())
`
If i do this: ... = client.create_collection(name='testing', embedding_function=EmbeddingFunction())
I got this Error:
TypeError: Protocols cannot be instantiated
No worries, Could you share the python code you are extending EmbeddingFunction with?
Just the line i mentioned above:
embeddings = requests.post("localhost:1234", json = input)
I use this (see link below) in the background and use just requests to send my text to the endpoint.
requests
you need extended from EmbeddingFunction,just like this:
import chromadb
from chromadb import Documents, Embeddings, EmbeddingFunction
from typing import Optional, Sequence, Union, TypeVar, List, Dict, Any, Tuple, cast
Embeddable = Union[Documents]
D = TypeVar("D", bound=Embeddable, contravariant=True)
class CustomEmbeddingFunction(EmbeddingFunction):
def call(self, input: D) -> Embeddings:
embeddings = [1, 2, 3]
return embeddings
client = chromadb.Client()
collection = client.create_collection(name="testing", embedding_function=CustomEmbeddingFunction())
and your Custom embedding function should use another name.
requests
you need extended from EmbeddingFunction,just like this:
import chromadb from chromadb import Documents, Embeddings, EmbeddingFunction from typing import Optional, Sequence, Union, TypeVar, List, Dict, Any, Tuple, cast Embeddable = Union[Documents] D = TypeVar("D", bound=Embeddable, contravariant=True) class CustomEmbeddingFunction(EmbeddingFunction): def call(self, input: D) -> Embeddings: embeddings = [1, 2, 3] return embeddings client = chromadb.Client() collection = client.create_collection(name="testing", embedding_function=CustomEmbeddingFunction())
and your Custom embedding function should use another name.
That seems to work. Thank you very much. :)
I created my own embedding function as suggested above:
ImageDType = Union[np.uint, np.int, np.float] Image = NDArray[ImageDType] Images = List[Image]
Images = List[Image] Embeddable = Union[Documents, Images]
D = TypeVar("D", bound=Embeddable, contravariant=True)
class CustomEmbeddingFunction(EmbeddingFunction): def call(self, input: D) -> Embeddings: embeddings = HuggingFaceEmbeddings( model_name="bert-base-multilingual-uncased" ) return embeddings
I am assigning it to collection: chroma_collection = chroma_client.get_or_create_collection(name=f"BasicRag", embedding_function=CustomEmbeddingFunction())
I am adding entries to db chroma_collection.add( documents=products_list, ids=ids )
and I can see that they are added, but when trying to search it I am only getting empty record result, what am I doing wrong
query = "products for heavy duty use"
embeddings = HuggingFaceEmbeddings( model_name="bert-base-multilingual-uncased" )
text_embeddings = embeddings.embed_query(query)
results = chroma_collection.query( query_embeddings=[text_embeddings], n_results=10, include=["documents"] )
Your embedding function is wrong, your call method return embeddings model itself, you should return the embedding of the input.
by the way, you shouldn't create the embedding model in the call method, This consumes resources.
this is a example:
class VitEmbeddingFunction(EmbeddingFunction):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.processor = ViTImageProcessor.from_pretrained("google/vit-base-patch16-224")
self.model = ViTModel.from_pretrained("google/vit-base-patch16-224")
def __call__(self, images: Documents) -> Embeddings:
images = [Image.open(image) for image in images]
inputs = self.processor(images, return_tensors="pt")
with torch.no_grad():
outputs = self.model(**inputs)
last_hidden_state = outputs.last_hidden_state
return last_hidden_state[:, 0, :].numpy().tolist()
in newest version of chromadb, the param of call method should be input
.
What happened?
I just try to use my own embedding function. This is what i got:
from chromadb import Documents, EmbeddingFunction, Embeddings from typing_extensions import Literal, TypedDict, Protocol from typing import Optional, Sequence, Union, TypeVar, List, Dict, Any, Tuple, cast
Embeddable = Union[Documents] D = TypeVar("D", bound=Embeddable, contravariant=True)
class EmbeddingFunction(Protocol[D]): def call(self, input: D) -> Embeddings: embeddings = [1,2,3] return embeddings
collection = client.create_collection(name="testing" , embedding_function=EmbeddingFunction)
But i got always... this error:
ValueError: Expected EmbeddingFunction.call to have the following signature: odict_keys(['self', 'input']), got odict_keys(['self', 'args', 'kwargs']) Please see https://docs.trychroma.com/embeddings for details of the EmbeddingFunction interface. Please note the recent change to the EmbeddingFunction interface: https://docs.trychroma.com/migration#migration-to-0416---november-7-2023
I looked up in the migration tab, but this doesn't helped. https://docs.trychroma.com/migration
Maybe someone can help me, i searched a lot, ireinstalled , checked version ... but nothing worked for me.
Versions
chromadb = 0.4.18
Relevant log output
No response