Inquiries about Netflix dataset item size and the processing method

Thank you for sharing the code and data.

I have a query regarding the Netflix dataset mentioned in your paper. According to the paper, the dataset includes 17,366 items. However, upon examining the train.json, val.json, and test.json files, the highest item number I noted is 17,363, with only 8,413 unique items being represented. This seems to contradict the statistics cited in your paper.

Could you please provide some clarification on this discrepancy?

Additionally, it would be helpful to have a detailed explanation of your data processing methods. Furthermore, the datasets provided do not include user ratings. Are all interactions noted in the train.json, val.json, and test.json files indicative of user preferences for movies (i.e. movies with high user ratings, like 4+)?

Thank you for your attention to these questions.

HKUDS / LLMRec

Inquiries about Netflix dataset item size and the processing method #16