Closed GarciaLnk closed 8 months ago
Thanks for yout interest in our work! For reproducing our LRURec perofrmance please perform hyperparameter searches with weight decay from [0, 1e-2] and dropout rate from [0.1, 0.2, 0.3, 0.4, 0.5], an example can be found at the bottom of here. As for the environment and baseline issues, I will organize our implementation and update the repo within a few weeks:)
Best Zhenrui
Hello @Yueeeeeeee!
Congratulations for your work!
I was asking myself the same question about the implementations of the baselines methods used (NARM, SASRec and BERT4Rec).
Could you point us to any open implementations you have used and the hyperparameters chosen?
Thank you very much for the help!
I use this way:https://github.com/Yueeeeeeee/LlamaRec/issues/1#issuecomment-1949472178,the retrieval model LRURec achieved good results.
But when using this LRURec to train Ranker, the effect of Ranker is not good. Here is my results, in beauty dataset:
{
"test_Recall@10": 0.08942324860472203,
"test_MRR@10": 0.0392304892349303,
"test_NDCG@10": 0.050945015045320945,
"test_Recall@5": 0.060988716312725656,
"test_MRR@5": 0.03545584686710355,
"test_NDCG@5": 0.041770513622367465,
"test_Recall@1": 0.021672936315532656,
"test_MRR@1": 0.021672936315532656,
"test_NDCG@1": 0.021672936315532656,
}
Results in the paper:
I'd appreciate any help to address these gap.
@Yueeeeeeee
I use this way:https://github.com/Yueeeeeeee/LlamaRec/issues/1#issuecomment-1949472178,the retrieval model LRURec achieved good results.
But when using this LRURec to train Ranker, the effect of Ranker is not good. Here is my results, in beauty dataset:
{ "test_Recall@10": 0.08942324860472203, "test_MRR@10": 0.0392304892349303, "test_NDCG@10": 0.050945015045320945, "test_Recall@5": 0.060988716312725656, "test_MRR@5": 0.03545584686710355, "test_NDCG@5": 0.041770513622367465, "test_Recall@1": 0.021672936315532656, "test_MRR@1": 0.021672936315532656, "test_NDCG@1": 0.021672936315532656, }
Results in the paper:
I'd appreciate any help to address these gap.
@Yueeeeeeee
Could you share the config you used to train on the Beauty dataset? Could you also share the LRURec config and results you used so I can reproduce the issue? Thanks!
I have the same question.
When using LlamaRec(LRURec + Ranker), the performance decreases after training the Ranker compared to the LRURec. Why does this happen?
I just used configurations provided in config.py. Could you please share the implementation details from your paper?
Thank you very much. @Yueeeeeeee
I have reproduced the numbers on ML-100k and am workin on the Beauty dataset, I will share a new script once the exps are done, thanks again for your interest in our work!
Hi, just using the default config, I was able to achieve a comparable performance to the paper: { "test_Recall@10": 0.09708042129722096, "test_MRR@10": 0.04098382929298969, "test_NDCG@10": 0.05406365780212878, "test_Recall@5": 0.06528747864848244, "test_MRR@5": 0.0367693631704106, "test_NDCG@5": 0.04381075943432336, }
If you still cannot get a comparable performance, I can also update an improved negative sampling strategy that further improves the ranking performance:)
@Yueeeeeeee May I ask what are the hyperparameters of the LRURec you are using, decay=?dropout=?
Thank you.
Sure, I select with the highest Recall@20, in my case it's 0.01 decay and 0.5 dropout:)
This is different from the default configuration you provided,
Is that so? what about args.rerank_best_metric
args.best_metric = 'Recall@20'
args.rerank_best_metric = 'NDCG@10'
It seems that you did not mention the negative sampling strategy in the paper, is it a full sort?
I really need this, thank you very mush ! My email : hangkees@aliyun.com https://github.com/Yueeeeeeee/LlamaRec/issues/1#issuecomment-2179404731
Thanks for the response,
In my case I the best Recall@10 and the best Recall@20 is the same model, with weight decay 0.01 and dropout 0.5. I am testing the popularity-based sampling and will upload to the repo in the next days:)
I've been working on replicating the findings from the LlamaRec paper, but I've encountered several challenges that I'd like to share. If necessary, I can split these into separate issues:
Dependency Installation: The
requirements.txt
provided can't be used to create a conda environment, as it includes pip-only dependencies. While manually installing the latest versions is a workaround, it will not match the original development environment and may lead to issues in the future. Providing anenvironment.yml
would be ideal.Baseline Code Absence: There's no provided code or documentation for the baseline comparisons. Given that different implementations can yield varying results, having access to the baseline code used would really help with reproducibility.
Metrics Discrepancy: The metrics for the retriever are consistently lower than reported in the paper, which also affects the overall performance of LlamaRec negatively. These discrepancies on the LRURec metrics can be reproduced on Colab with this notebook. I've also included a comparative table below for reference:
I'd appreciate any help to address these issues.