Open mickey-lyx opened 1 year ago
I have the problem too
@tazarov Hi, could you please look at this problem? Thank you for you time!
@mickey-lyx, thanks for reporting this. I'll take a look at this soon. At a glance, the code looks fine, and the actual result seems to be fine - you have 61 docs once you remove 47 from the starting 107. All in all, this seems like a warning, not an actual bug. The I will have a look and let you know.
@tazarov Really appreciate it. The result is right. I'm just wondering why there appears to be warnings of deleting nonexisting embeddings. Is it because the embeddings were deleted multiple times?
I have the same issue, and running queries on the db triggers this warning every time. What I did is selected items based on where statement (no ID was given) and removed them one-by-one:
my_collection.delete(
where={"file_id": str(file_id)}
)
Since then the warning is shown every time I query it.
I'm having the same issue. This seems to occur even when an empty list is passed as ids to Collection.delete.
We'd love to get this fixed - is anyone able to help post a minimal repro?
@jeffchuber
import chromadb
from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction
def main():
client = chromadb.PersistentClient(path="./chroma_db")
collection = client.get_or_create_collection(name="test", embedding_function=OpenAIEmbeddingFunction())
num_1 = 47
num_2 = 70
texts_1 = [f"text_1.{i}" for i in range(num_1)]
ids_1 = [f"1.{i}" for i in range(num_1)]
texts_2 = [f"text_2.{i}" for i in range(num_2)]
ids_2 = [f"2.{i}" for i in range(num_2)]
collection.add(ids=ids_1, documents=texts_1)
collection.add(ids=ids_2, documents=texts_2)
print("count before", collection.count())
collection.delete(ids_1)
print("count after", collection.count())
if __name__ == '__main__':
main()
I'm seeing similar warnings, but I'm unsure if I should be concerned since it's a warning. It would be good to get some insights to why this occurs even after uploading a few PDF files and while the fastapi is idle, keeps logging.
112-49d5-a776-2c02c03897e8:77661df1-86bc-4f33-9119-a90d77f7c24e
chroma | 2023-09-16 15:22:31 WARNING chromadb.segment.impl.vector.brute_force_index Delete of nonexisting embedding ID: c441314d-7112-49d5-a776-2c02c03897e8:484a228b-de38-4674-8f14-078f4f218afd
chroma | 2023-09-16 15:22:31 WARNING chromadb.segment.impl.vector.brute_force_index Delete of nonexisting embedding ID: c441314d-7112-49d5-a776-2c02c03897e8:51c75801-6ecd-4490-941e-8ee6f2229476
chroma | 2023-09-16 15:22:31 WARNING chromadb.segment.impl.vector.brute_force_index Delete of nonexisting embedding ID: c441314d-7112-49d5-a776-2c02c03897e8:282cb350-257b-49ef-ae55-ab3997099d58
chroma | 2023-09-16 15:22:31 WARNING chromadb.segment.impl.vector.brute_force_index Delete of nonexisting embedding ID: c441314d-7112-49d5-a776-2c02c03897e8:fe9d8119-b72a-44c1-9bc5-f5c173621a4b
chroma | 2023-09-16 15:22:31 WARNING chromadb.segment.impl.vector.brute_force_index Delete of nonexisting embedding ID: c441314d-7112-49d5-a776-2c02c03897e8:c92f759d-f0e7-46e9-9156-e5c47e917de7
chroma | 2023-09-16 15:22:31 WARNING chromadb.segment.impl.vector.brute_force_index Delete of nonexisting embedding ID: c441314d-7112-49d5-a776-2c02c03897e8:5be4bf1c-7c02-4815-9c25-de4463b0231f
chroma | 2023-09-16 15:22:31 WARNING chromadb.segment.impl.vector.brute_force_index Delete of nonexisting embedding ID: c441314d-7112-49d5-a776-2c02c03897e8:32500766-ceb7-4b12-8e8d-04b34306f30f
chroma | 2023-09-16 15:22:31 WARNING chromadb.segment.impl.vector.brute_force_index Delete of nonexisting embedding ID: c441314d-7112-49d5-a776-2c02c03897e8:7e5d60fd-cb8a-4ecf-adf3-8d86694458e8
chroma | 2023-09-16 15:22:31 WARNING chromadb.segment.impl.vector.brute_force_index Delete of nonexisting embedding ID: c441314d-7112-49d5-a776-2c02c03897e8:5cfbdc44-cc08-4749-8d5d-d628f6aa4676
chroma | 2023-09-16 15:22:31 WARNING chromadb.segment.impl.vector.brute_force_index Delete of nonexisting embedding ID: c441314d-7112-49d5-
package versions
chromadb==0.4.10 langchain==0.0.225
Running chroma client server with the latest Docker version
chroma:
container_name: chroma
image: ghcr.io/chroma-core/chroma:latest
volumes:
- index_data:/chroma/chroma
environment:
- IS_PERSISTENT=true
- CHROMA_SERVER_HTTP_PORT=8000
restart: unless-stopped
ports:
- '8000:8000'
networks:
- mynetwork
I have the same issue, and running queries on the db triggers this warning every time. What I did is selected items based on where statement (no ID was given) and removed them one-by-one:
my_collection.delete( where={"file_id": str(file_id)} )
Since then the warning is shown every time I query it.
I am having this exact issue too
@jeffchuber, @chrispangg, @timothymugayi, @mickey-lyx, As I mentioned above, the issue is benign. Chroma maintains a temporary index of embeddings before it flushes it to disk after it reaches a certain threshold. In your example, the threshold is reached (100) so the temp index is flushed and cleared, and subsequent entries are appended to it, but when delete comes right after add Chroma attempts to remove any and all embeddings from the temporary index which leads to the warning you see. I have made a fix to properly check if ids to be removed are part of the temp index and if not Chroma will not attempt deletion.
PR's on the way.
@HammadB I think we can close this now.
I think this issue is still present. I've just stumbled upon it in my application. And I'm using latest (0.4.24) version of Chroma, so the fix from #1150 should probably be already merged.
我更新了chromadb==0.5.0,但还是有这个问题: 我是用threading更新的: t=threading.Thread(target=mydb.add_collection_from_file,args=[local_f],daemon=True) t.start()
@running-frog, @s-peryt, we have a bug in the HNSW binary index that, under certain conditions, can result in the above errors. There is a PR - #2062 that should resolve this.
What happened?
Hi there, I tried to upload two PDF files to a persistant collection and delete one of them. But I received Warning Messages: "Delete of nonexisting embedding ID". This Warning only appears when I upload multiple files and delete one of them. Here are my test files and code.
alphabet-2023-q1-10q.pdf Apple Inc.-10K.pdf
Versions
chromadb==0.4.5 langchain==0.0.264 python==3.10.12 MacOS==13.3.1
Relevant log output