Does the file format have implications for performance?
Should we use something equivalent to numpy memmap.
In addition to loading and retrieving vectors from SSD, we should also implement the ability to benchmark them in memory (for when we are running the distributed architectures). In lieu of the actual partitions, we can just use Deep1M (or even a smaller subset of the data) to develop.
In addition to loading and retrieving vectors from SSD, we should also implement the ability to benchmark them in memory (for when we are running the distributed architectures). In lieu of the actual partitions, we can just use Deep1M (or even a smaller subset of the data) to develop.