After looking at the balanced graph partitioning paper and code more closely, @MithiJ and I have realized that there are several extra assumptions baked into that approach. For example, it uses KaMinPar in order to partition the HNSW or LSH graph (the index), and compares this partitioning approach to k-means clustering, balanced k-means clustering, and the Pyramid paper clustering method.
We should consider the benefit to our project of implementing each of these partitioning strategies in our benchmarking suite, as well as other partitioning strategies that are slightly more intelligent than random.
We are also assuming that once we have partitioned a dataset, we can reconstruct an index on each of the nodes without any direct reference to the partitioning strategy. For example, if we use Pyramid partitioning to create 10 partitions for a 10-node experiment, each node will first build an index (HNSW or LSH) from only the partition file without any direct knowledge of what partitioning strategy was used to create the file.
After looking at the balanced graph partitioning paper and code more closely, @MithiJ and I have realized that there are several extra assumptions baked into that approach. For example, it uses KaMinPar in order to partition the HNSW or LSH graph (the index), and compares this partitioning approach to k-means clustering, balanced k-means clustering, and the Pyramid paper clustering method.
We should consider the benefit to our project of implementing each of these partitioning strategies in our benchmarking suite, as well as other partitioning strategies that are slightly more intelligent than random.
We are also assuming that once we have partitioned a dataset, we can reconstruct an index on each of the nodes without any direct reference to the partitioning strategy. For example, if we use Pyramid partitioning to create 10 partitions for a 10-node experiment, each node will first build an index (HNSW or LSH) from only the partition file without any direct knowledge of what partitioning strategy was used to create the file.