The performance gap compared to P5

HestiaSky / E4SRec

MIT License

37 stars 7 forks source link

Hi! It's great work and thanks for publishing the code.

However, according to the results in your paper, it seems that the performance is much worse than P5 "Recommendation as Language Processing (RLP): A Unified Pretrain, Personalized Prompt & Predict Paradigm (P5)", especially on Yelp (H@5 0.0266 v.s. 0.0574) and Sports (H@5 0.0281 v.s. 0.0387). I have checked that the statistics of the datasets and the evaluation methods are the same in your paper and P5. So, I am really curious about the reason behind the huge performance gap. Is there any difference in design choices or implementations?

HestiaSky / E4SRec

The performance gap compared to P5 #1