Semafind / semadb

No fuss multi-index hybrid vector database / search engine
https://semadb.com/
Apache License 2.0
9 stars 2 forks source link

Left over edges in delete points #8

Closed nuric closed 8 months ago

nuric commented 8 months ago

The DeletePoints operation potentially leaves behind edges to deleted points. Initially we doubled downed on the assumption that more often than not there would be bidirectional edges between points. This is, however, not the case which leads to active edges to points that do not exist. During search that throws an error. There are three approaches:

  1. Scan all edges and delete the ones that point to a deleted point
  2. Prune optimistically of the neighbours of the deleted points, then during getPointNeighbours to check if the neighbour exists
  3. A midway where we mark the deleted points, ignore them during search and only do a full prune when it reaches say 10% of total size.

We are going with 1 for now to achieve correctness. The sharding process means no single shard will be too large to cause a huge performance hit. Each shard scan can be done in parallel too. In the future we can implement 3 by keeping track of the number of deleted points and only doing a full prune when it reaches a certain threshold.