Scope single/multi node index conditions experiment

Both of these architectures don't determine a particular index, as they rather specify the way that vectors should be distributed, routed, and then aggregated:

Name	Storage	Distribution	Routing	Aggregation
SsdReplicated	SSD	All indices replicated on all nodes	Each request sent to one node	Result forwarded direct from router to client.
DramRandomPartitions	DRAM	Indices built independently on each node	All requests sent to all nodes	All combined, then top K selected and sent to client.

Thus we can test each architecture above with a range of different indexes, such as:

Exhaustive (baseline)
HNSW
LSH
Hybrid indexes

We should also work out what we want to measure in terms of each of these experiments. In #18 we are considering a throughput/latency graph for each architecture/index pair, for example. But it could also be interesting to determine other aspects of the approach, such as:

[Per architecture/index pair] Maximum dataset size (assuming a certain standardized machine, perhaps, or parameterized over machine RAM.)
[Per index] Index creation time as a function of number of vectors.
[Per architecture/index pair] Latency/recall graph, as shown in #18.

As Deepti noted during our meeting yesterday, if we are interested in comparing the 'standalone' setting against the 'distributed' setting, we should compare according to the SoTA for each of those settings; not the unoptimized baseline implementations that we have above.

In other words:

SsdReplicated should be supplemented (or replaced) by:
- DiskAnn, a system that uses hybrid RAM/SSD on a single machine.
DramRandomPartitions should be supplemented (or replaced) by either:
- Balanced Graph Partitioning, a distributed system with sharding and routing based on a partition of the index.
- SPANN, a system that can be distributed and partitioned and has superior performance to Pyramid.
We should also consider benchmarking a heterogenous single-node system to further motivate any argument about when one should shift scale horizontally as opposed to vertically / not at all, such as:
- FusionANNS, which is single machine with GPU.

breezykermo / oak

Scope single/multi node index conditions experiment #23