allegro / allRank

allRank is a framework for training learning-to-rank neural models based on PyTorch.
Apache License 2.0
849 stars 119 forks source link

My experiments consistently underperform in comparison to the paper's reported results. #62

Open zhuqinghahaha opened 1 year ago

zhuqinghahaha commented 1 year ago

Although I followed the exact settings outlined in the reproducibility file, my experiments consistently yield inferior results compared to those reported in the paper. Any suggestions, recommendations, or additional details I might overlooked?

WEB30K—Result in paper

Loss Self-attention Self-attention Self-attention MLP MLP MLP
  NDCG@5 NDCG@10 NDCG@30 NDCG@5 NDCG@10 NDCG@30
NDCGLoss 2++ 52.65+-0.37 54.49+-0.27 59.80+-0.08 49.15+-0.44 51.22+-0.34 57.14+-0.23
LambdaRank 52.29+-0.31 54.08+-0.19 59.48+-0.12 48.77+-0.38 50.85+-0.28 56.72+-0.17

WEB30K— Reproduce

Loss Self-attention Self-attention Self-attention MLP MLP MLP
  NDCG@5 NDCG@10 NDCG@30 NDCG@5 NDCG@10 NDCG@30
NDCGLoss 2++ 48.825 ±0.025 50.587±0.062 56.473±0.012 48.084±0.118 49.623±0.106 55.497±0.108
LambdaRank 48.015±0.351 49.602±0.147 55.466±0.180 41.739± 0.341 43.562±0.159 49.656±0.134
sadaharu-inugami commented 1 year ago

Unfortunately, it's been quite some time since I wrote the reproducibility file. However, there was a previous issue detailing certain problems with reproducibility and the user was able to match our results: https://github.com/allegro/allRank/issues/23.

Did you preprocess each fold separately using the provided script?

zhuqinghahaha commented 1 year ago

Thank you for your reply! I successfully reproduce the result after doing the features normalization.

LOSS Self-attention Self-attention Self-attention MLP MLP MLP
NDCGLoss 2++(Vali) 0.52359 0.54353 0.5982 0.48959 0.51058 0.57051
LambdaRank(Vali) 0.51851 0.53841 0.59356 0.48601 0.50747 0.56734