chroma-core / chroma

the AI-native open-source embedding database
https://www.trychroma.com/
Apache License 2.0
14.67k stars 1.22k forks source link

[Bug]: Deleting non-existing ID returns that ID in the result #2841

Open cincuranet opened 6 hours ago

cincuranet commented 6 hours ago

What happened?

Pseudo code:

  1. Add(ids: [id1])
  2. Delete(ids: [id1, id2]) (no where, no where_document)

In Chroma's log I see Delete of nonexisting embedding ID: .... But the response contains both id1 and id2.

Found in https://github.com/ssone95/ChromaDB.Client/pull/31.

cc @atroyn

Versions

0.5.5

Relevant log output

WARNING:  [24-09-2024 09:03:03] Delete of nonexisting embedding ID: id2
INFO:     [24-09-2024 09:03:03] 192.168.66.102:64200 - "POST /api/v1/collections/d5932ebf-8ef3-46c8-b4d5-5ac336495994/delete HTTP/1.1" 200
tazarov commented 3 hours ago

@cincuranet, thanks for reporting this. collection.delete() does not return any response as such:

https://github.com/chroma-core/chroma/blob/408445fe4e965b4ee863dca03bf98d5b9d13ea94/chromadb/api/models/Collection.py#L310-L316

However looking at the server API and the segment API I can tell that we return the IDs that were originally passed, which is wrong:

https://github.com/chroma-core/chroma/blob/408445fe4e965b4ee863dca03bf98d5b9d13ea94/chromadb/api/segment.py#L555-L561

https://github.com/chroma-core/chroma/blob/408445fe4e965b4ee863dca03bf98d5b9d13ea94/chromadb/api/segment.py#L603-L604

https://github.com/chroma-core/chroma/blob/408445fe4e965b4ee863dca03bf98d5b9d13ea94/chromadb/api/segment.py#L620

All in all this is an inconsistent behavior that indeed does feel misleading (not that using the Chroma python/JS clients will expose you this inconsistency).

Given the current implementation in single-node Chroma this can be challenging to fix due to the how the WAL works. But let me have a look if something simple solution can be found.

Great job on the C# standalone client by the way 💯

HammadB commented 2 hours ago

Agree this is misleading, the easiest correct + uniform solution would be to make all APIs return None

cincuranet commented 2 hours ago

thanks for reporting this. collection.delete() does not return any response as such:

@tazarov The JS SDK returns A promise that resolves to the IDs of the deleted items. though.

tazarov commented 2 hours ago

@cincuranet, maybe I spoke before I checked :). Indeed as @HammadB, suggested to simplest will be to return None on delete (an possibly other similar ops that may have incongruent behavior across APIs).

cincuranet commented 2 hours ago

Sure, that works. update operation returns void/None as well. Once you make a decision report here and I'll follow with C# implementation.