How the sparse datasets are stored?

Hi professor @maumueller ,

As you mentioned the Kosarak and MovieLens-10M are sparse dataset and they are packed like a scipy csr format,

So, when I use this distance function in https://github.com/erikbern/ann-benchmarks/blob/main/ann_benchmarks/distance.py#L104 to get train and test objects which are basically list object, right? np.ndarray.

Then I found that len of each row is not the same, so it still stored in compact style, right?

I guess that does each element in the rows represents the index of the non-zero element in the orignal sparse vector?

f = h5py.File(h5_file, 'r')
train, test = dataset_transform(f)
print(type(train))
print(len(train))
for i in train[0]:
      print(str(i), end=' ')
print()

erikbern / ann-benchmarks

How the sparse datasets are stored? #475

I guess that does each element in the rows represents the index of the non-zero element in the orignal sparse vector?