chroma-core / chroma

the AI-native open-source embedding database
https://www.trychroma.com/
Apache License 2.0
14.57k stars 1.21k forks source link

[Bug]: Cannot return the results in a contigious 2D array. Probably ef or M is too small #997

Open bodybreaker opened 1 year ago

bodybreaker commented 1 year ago

What happened?

In my program, I insert about 10k pieces of data into chromaDB. Below is the pseudocode:

Loop range(10k) :

  1. query something from chromadb
  2. insert something to chormadb (one by one)

At first run, it always looks success. but second run I get the error sometimes "[Bug]: Cannot return the results in a contiguous 2D array. Probably ef or M is too small."

Its weird. Because at first run it's not happened but after that it always

Versions

0.4.4

Relevant log output

Cannot return the results in a contigious 2D array. Probably ef or M is too small
bodybreaker commented 1 year ago

Im creating collection with this metadata={"hnsw:M": 128,"hnsw:construction_ef": 128,"hnsw:search_ef": 128})

HammadB commented 1 year ago

@bodybreaker Are you using filters on your graph? If you query without the filters do you still see this error?

HammadB commented 1 year ago

I think we should add a better error message for this and point to the troubleshooting guide. But also we can improve the indexing to support filtering better.

lenhhoxung86 commented 1 year ago

I have the same error message, but it happens when n_results is large. Specifically, I have N=300000 documents, and if I set n_results=N (or less then N a bit), it throws the message:

RuntimeError: Cannot return the results in a contigious 2D array. Probably ef or M is too small

Why I want to get back N results? Because I have multiple embeddings for one entity, and I want to calculate a unique distance based on element distances.

bodybreaker commented 1 year ago

@bodybreaker Are you using filters on your graph? If you query without the filters do you still see this error? @HammadB Yes, I'm using filters. I got no error without filters.

galtay-tempus commented 1 year ago

@HammadB thanks for investigating. I'm experiencing this error as well (my queries are filtered, haven't tried it w/o filtering). You mention the troubleshooting guide, but I'm not seeing any advice there that seems relevant to this problem. Is there something we should try to improve this? (https://docs.trychroma.com/troubleshooting)

galtay commented 1 year ago

@HammadB thanks for investigating. I'm experiencing this error as well (my queries are filtered, haven't tried it w/o filtering). You mention the troubleshooting guide, but I'm not seeing any advice there that seems relevant to this problem. Is there something we should try to improve this? (https://docs.trychroma.com/troubleshooting)

ps, I'm using the default values in the langchain wrapper to chroma

jzombie commented 1 year ago

I had this issue several times and realized that one of my query_texts array elements was an empty string.

Edit: I thought that had solved my issue, but it didn't all the way; now I am catching the error and handling it in a way I probably shouldn't be.

ZWMG commented 11 months ago

I deployed chroma using docker. I also encountered this problem today. According to the solutions in the issues, the problem was not solved. I restarted the chroma container and the problem was solved. Therefore, I also need to verify whether there is a memory leak in chroma or whether resources are not released in time.

wwwzzz17 commented 9 months ago

Same here. I use PersistentClient for the client and set persistent_dir=./chromadb/ on my disk. I tried to increase the values of m and ef, but it did not work. However, when I delete the chromadb folder, it works, just like PokWill restarting the container if he does not mount the data outside the container.