Open zlwu92 opened 8 months ago
Hi professor @maumueller ,
As you mentioned the Kosarak and MovieLens-10M are sparse dataset and they are packed like a scipy csr format,
Kosarak and MovieLens-10M
So, when I use this distance function in https://github.com/erikbern/ann-benchmarks/blob/main/ann_benchmarks/distance.py#L104 to get train and test objects which are basically list object, right? np.ndarray.
Then I found that len of each row is not the same, so it still stored in compact style, right?
f = h5py.File(h5_file, 'r') train, test = dataset_transform(f) print(type(train)) print(len(train)) for i in train[0]: print(str(i), end=' ') print()
Hi professor @maumueller ,
As you mentioned the
Kosarak and MovieLens-10M
are sparse dataset and they are packed like a scipy csr format,So, when I use this distance function in https://github.com/erikbern/ann-benchmarks/blob/main/ann_benchmarks/distance.py#L104 to get train and test objects which are basically list object, right? np.ndarray.
Then I found that len of each row is not the same, so it still stored in compact style, right?
I guess that does each element in the rows represents the index of the non-zero element in the orignal sparse vector?