Open ifadams opened 2 months ago
Migrated from internal from @s-gobriel
I think it is important to explain the delete functionality from the VCL side.
The basic functionality of delete IDs works with the following in mind.
The different index engines handle the delete functionality differently, as follows:
• IndexIVF; store the descriptor ids explicitly with the index. As a result, the ids of the other descriptors will not change after a delete operation.
• IndexFlat (other indices in FAISS that we are not supporting in VDMS has the same behavior like IndexPQ, ..etc.). Supports remove_id function which will delete the descriptor in question. However, it is important to understand that this index does not store the IDs explicitly, hence, the delete operation will shift the ids of vectors bigger than the current id by 1.
• IndexFLINNG (no delete operation is supported because for hash_tables delete is not supported)
The logic for VDMS client or the user application need to be modified to map the logic explained above to present the correct vectors to the application after a deletion operation.
Hope this is clear.
BTW, related to the delete functionality, duplicate detection is a trickier issue that can only be handled by the application.
Active discussions underway, updates on diagnosis here:
What's going on is a mismatch between the behavior of the KNN, PMGD, and client expectations.
Currently, we allow an "_expiration" field to be included as part of a descriptor. This field sets a timer for automatic delete (if turned on) which in will automatically delete PMGD graph nodes affiliated with a particular descriptor.
A KNN search returns the nearest neighbors, and the IDs are used internally to increase the specificity of the query.
However, the index the KNN is running over does not always support deletion, and currently internally deletion is not deleted. So its possible that a KNN search returns a "deleted" ID, and since it does not match an existing ID in the graph database, we return nothing.
Describe the bug
As stated in Wiki: Deletion Capabilities, the _deletion query allows a user to delete the content within VDMS that is associated with a find query (FindImage, FindEntity, FindDescriptor). Currently, descriptor deletion is NOT fully supported.
To Reproduce Steps to reproduce the behavior (as shown in attached document):