Implementation: benchmarking on disk and in memory

Does the file format have implications for performance?
Should we use something equivalent to numpy memmap.

In addition to loading and retrieving vectors from SSD, we should also implement the ability to benchmark them in memory (for when we are running the distributed architectures). In lieu of the actual partitions, we can just use Deep1M (or even a smaller subset of the data) to develop.

breezykermo / oak

Implementation: benchmarking on disk and in memory #2