My experiments consistently underperform in comparison to the paper's reported results.

zhuqinghahaha commented 1 year ago

Although I followed the exact settings outlined in the reproducibility file, my experiments consistently yield inferior results compared to those reported in the paper. Any suggestions, recommendations, or additional details I might overlooked?

WEB30K—Result in paper

Loss	Self-attention	Self-attention	Self-attention	MLP	MLP	MLP
	NDCG@5	NDCG@10	NDCG@30	NDCG@5	NDCG@10	NDCG@30
NDCGLoss 2++	52.65+-0.37	54.49+-0.27	59.80+-0.08	49.15+-0.44	51.22+-0.34	57.14+-0.23
LambdaRank	52.29+-0.31	54.08+-0.19	59.48+-0.12	48.77+-0.38	50.85+-0.28	56.72+-0.17

WEB30K— Reproduce

Loss	Self-attention	Self-attention	Self-attention	MLP	MLP	MLP
	NDCG@5	NDCG@10	NDCG@30	NDCG@5	NDCG@10	NDCG@30
NDCGLoss 2++	48.825 ±0.025	50.587±0.062	56.473±0.012	48.084±0.118	49.623±0.106	55.497±0.108
LambdaRank	48.015±0.351	49.602±0.147	55.466±0.180	41.739± 0.341	43.562±0.159	49.656±0.134

sadaharu-inugami commented 1 year ago

Unfortunately, it's been quite some time since I wrote the reproducibility file. However, there was a previous issue detailing certain problems with reproducibility and the user was able to match our results: https://github.com/allegro/allRank/issues/23.

Did you preprocess each fold separately using the provided script?

zhuqinghahaha commented 1 year ago

Thank you for your reply! I successfully reproduce the result after doing the features normalization.

LOSS	Self-attention	Self-attention	Self-attention	MLP	MLP	MLP
NDCGLoss 2++(Vali)	0.52359	0.54353	0.5982	0.48959	0.51058	0.57051
LambdaRank(Vali)	0.51851	0.53841	0.59356	0.48601	0.50747	0.56734

allegro / allRank

My experiments consistently underperform in comparison to the paper's reported results. #62