allegro / allRank

allRank is a framework for training learning-to-rank neural models based on PyTorch.
Apache License 2.0
865 stars 119 forks source link

np.unique() does not preserve order #33

Closed hosseinfani closed 3 years ago

hosseinfani commented 3 years ago

I think there is a bug in LibSVMDataset

When creating groups and then splitting the input X, the np.unique() does not preserve the order of the query_ids. Hence, the split won't be done correctly!

A better way would be:

self.query_ids = Counter(query_ids) groups = np.cumsum(list(self.query_ids.values()))

sadaharu-inugami commented 3 years ago

Thank you, it is a great find! It is indeed a bug in our code and fixing it improves the results in our papers by a large margin because of the way query_ids are arranged in MSLR-WEB30K. Fortunately, this bug has effect only on the train subsets of this dataset because the validation and test subsets are sorted by ascending query_ids, so the results in both papers are still comparable with related work.

The new numbers are as high as 52.3 NDCG@5 (NDCGLoss2++, 51.2 before) and 52.6 NDCG@5 (Ordinal loss, 51.9 before) on the validation subset of fold 1. Interestingly, pointwise loss functions are still very strong.

We will fix this issue next week as we also need to update the papers.

sadaharu-inugami commented 3 years ago

This issue has been fixed in https://github.com/allegro/allRank/pull/34.

hosseinfani commented 3 years ago

great! thanks.