Why the performance is very different to other paper?

andreeaiana / newsreclib

PyTorch-Lightning Library for Neural News Recommendation

https://newsreclib.readthedocs.io/en/latest/

MIT License

43 stars 8 forks source link

Why the performance is very different to other paper? #4

Open chiyuzhang94 opened 11 months ago

chiyuzhang94 commented 11 months ago

Hi Andreea,

I notice that the model performance reported in your paper is very different to the performance in original paper. For example, MINER (Li et al. 2019) got AUC=69.61 on MIND-small dataset but your reported performance is only AUC=51.2. Compared to other work reproduced MINER model, this performance is much lower than others. For example, this paper reported that their reproduced MINER model got AUC of 63.88. In general, most GeneralRec models in your Table 1 got AUC < 52.00, which are largely different to the performance reported in other papers. Could you give any comments on this?

andreeaiana commented 11 months ago

Hi,

The data splits used in the other papers are most likely different than the one used by us. Neither the MINER paper, nor the one referenced by you explicitly mention which split of the MIND dataset they use, so I assume they used the test portion, without the publicly available labels. In contrast, as explained in our paper (Section 2.5), we use the MINDdev portion of the dataset as our test split, and further split the MINDtrain dataset into training and validation portions, respectively.

chiyuzhang94 commented 11 months ago

Hi,

Yes, I understand the different data split can lead to some variances but 10+ AUC differences is too large. The Dev and Test are come from same dataset and should not have dramatically shifting. Have you verify the performance by running the official codes from the original paper (e.g., MINER) on your data splits?