HKUDS / LLMRec

[WSDM'2024 Oral] "LLMRec: Large Language Models with Graph Augmentation for Recommendation"
https://arxiv.org/abs/2311.00423
Apache License 2.0
374 stars 47 forks source link

Inquiries about Netflix dataset item size and the processing method #16

Open seamoon224 opened 6 months ago

seamoon224 commented 6 months ago

Thank you for sharing the code and data.

I have a query regarding the Netflix dataset mentioned in your paper. According to the paper, the dataset includes 17,366 items. However, upon examining the train.json, val.json, and test.json files, the highest item number I noted is 17,363, with only 8,413 unique items being represented. This seems to contradict the statistics cited in your paper.

Could you please provide some clarification on this discrepancy?

Additionally, it would be helpful to have a detailed explanation of your data processing methods. Furthermore, the datasets provided do not include user ratings. Are all interactions noted in the train.json, val.json, and test.json files indicative of user preferences for movies (i.e. movies with high user ratings, like 4+)?

Thank you for your attention to these questions.