breezykermo / oak

1 stars 0 forks source link

Scope single/multi node index conditions experiment #23

Closed breezykermo closed 3 weeks ago

breezykermo commented 4 weeks ago

Both of these architectures don't determine a particular index, as they rather specify the way that vectors should be distributed, routed, and then aggregated:

Name Storage Distribution Routing Aggregation
SsdReplicated SSD All indices replicated on all nodes Each request sent to one node Result forwarded direct from router to client.
DramRandomPartitions DRAM Indices built independently on each node All requests sent to all nodes All combined, then top K selected and sent to client.

Thus we can test each architecture above with a range of different indexes, such as:

We should also work out what we want to measure in terms of each of these experiments. In #18 we are considering a throughput/latency graph for each architecture/index pair, for example. But it could also be interesting to determine other aspects of the approach, such as:

breezykermo commented 3 weeks ago

As Deepti noted during our meeting yesterday, if we are interested in comparing the 'standalone' setting against the 'distributed' setting, we should compare according to the SoTA for each of those settings; not the unoptimized baseline implementations that we have above.

In other words: