HKUDS / RLMRec

[WWW'2024] "RLMRec: Representation Learning with Large Language Models for Recommendation"
https://arxiv.org/abs/2310.15950
Apache License 2.0
343 stars 44 forks source link

Data preprocess #14

Open YibinLiu666 opened 6 days ago

YibinLiu666 commented 6 days ago

Hello, I am very interested in your work and would like to process and reproduce your code from the raw data, so that I can do further research. I see that you mentioned in your paper that the data with a score of less than 3 in the Amazon-book and yelp datasets is filtered, and k-core is also filtered on Steam, can you provide the code about this part of preprocessing?

Re-bin commented 1 day ago

Hi 👋!

Thanks for your interests on RLMRec! Due to the complexity of the pre-processing code and its multi-file structure, it might be more helpful to provide a straightforward overview of the basic workflow for the pre-processing steps, as outlined below:

  1. Score Filtering: Begin by filtering out low-score interactions (implemented using a for loop).
  2. User Sampling (Discussion in Issue 9) Next, uniformly sample a ratio of users and remove items that have not been interacted with after filtering the users. This will help reduce the dataset size (implemented with boolean vectors).
  3. K-Core Filtering: Finally, apply k-core filtering using the NetworkX library.

I hope the above answer is helpful to you :)

Best regards, Xubin