HKUDS / RLMRec

[WWW'2024] "RLMRec: Representation Learning with Large Language Models for Recommendation"
https://arxiv.org/abs/2310.15950
Apache License 2.0
321 stars 36 forks source link
collaborative-filtering graph-neural-networks large-language-models recommendation recommender-systems

RLMRec: Representation Learning with Large Language Models for Recommendation

This is the PyTorch implementation by @Re-bin for RLMRec model proposed in this paper:

Representation Learning with Large Language Models for Recommendation
Xubin Ren, Wei Wei, Lianghao Xia, Lixin Su, Suqi Cheng, Junfeng Wang, Dawei Yin, Chao Huang\ WWW2024*

* denotes corresponding author

RLMRec

In this paper, we propose a model-agnostic framework RLMRec that enhances existing recommenders with LLM-empowered representation learning. It proposes a paradigm that integrates representation learning with LLMs to capture intricate semantic aspects of user behaviors and preferences. RLMRec incorporates auxiliary textual signals, develops a user/item profiling paradigm empowered by LLMs, and aligns the semantic space of LLMs with the representation space of collaborative relational signals through a cross-view alignment framework.

📝 Environment

You can run the following command to download the codes faster:

git clone --depth 1 https://github.com/HKUDS/RLMRec.git

Then run the following commands to create a conda environment:

conda create -y -n rlmrec python=3.9
conda activate rlmrec
pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu116
pip install torch-scatter -f https://data.pyg.org/whl/torch-1.13.1+cu117.html
pip install torch-sparse -f https://data.pyg.org/whl/torch-1.13.1+cu117.html
pip install pyyaml tqdm

😉 The codes are developed based on the SSLRec framework.

📚 Text-attributed Recommendation Dataset

We utilized three public datasets to evaluate RLMRec: Amazon-book, Yelp, and Steam.

Each user and item has a generated text description.

First of all, please download the data by running following commands.

 cd data/
 wget https://archive.org/download/rlmrec_data/data.zip
 unzip data.zip

You can also download our data from the [Google Drive].

Each dataset consists of a training set, a validation set, and a test set. During the training process, we utilize the validation set to determine when to stop the training in order to prevent overfitting.

- amazon(yelp/steam)
|--- trn_mat.pkl    # training set (sparse matrix)
|--- val_mat.pkl    # validation set (sparse matrix)
|--- tst_mat.pkl    # test set (sparse matrix)
|--- usr_prf.pkl    # text description of users
|--- itm_prf.pkl    # text description of items
|--- usr_emb_np.pkl # user text embeddings
|--- itm_emb_np.pkl # item text embeddings

User/Item Profile

😊 You can run the code python data/read_profile.py as an example to read the profiles as follows.

$ python data/read_profile.py
User 123's Profile:

PROFILE: Based on the kinds of books the user has purchased and reviewed, they are likely to enjoy historical
fiction with strong character development, exploration of family dynamics, and thought-provoking themes. The user 
also seems to enjoy slower-paced plots that delve deep into various perspectives. Books with unexpected twists, 
connections between unrelated characters, and beautifully descriptive language could also be a good fit for 
this reader.

REASONING: The user has purchased several historical fiction novels such as 'Prayers for Sale' and 'Fall of 
Giants' which indicate an interest in exploring the past. Furthermore, the books they have reviewed, like 'Help 
for the Haunted' and 'The Leftovers,' involve complex family relationships. Additionally, the user appreciates 
thought-provoking themes and character-driven narratives as shown in their review of 'The Signature of All 
Things' and 'The Leftovers.' The user also enjoys descriptive language, as demonstrated in their review of 
'Prayers for Sale.'

Semantic Representation

Mapping to Original Data

The original data of our dataset can be found from following links (thanks to their work):

We provide the mapping dictionary in JSON format in the data/mapper folder to map the user/item ID in our processed data to the original identification in original data (e.g., asin for items in Amazon-book).

🤗 Welcome to use our processed data to improve your research!

🚀 Examples to run the codes

The command to evaluate the backbone models and RLMRec is as follows.

Supported models/datasets:

Hypeparameters:

🔮 Profile Generation and Semantic Representation Encoding

Here we provide some examples with Yelp Data to generate user/item profiles and semantic representations.

Firstly, we need to complete the following three steps.

Then, here are the commands to generate the desired output with examples:

For semantic representation encoding, you can also try other text embedding models like Instructor or Contriever.

😀 The instructions we designed are saved in the {user/item}_system_prompt.txt files and also the generation/instruction folder. You can modify them according to your requirements and generate the desired output!

🌟 Citation

If you find this work is helpful to your research, please consider citing our paper:

@inproceedings{ren2024representation,
  title={Representation learning with large language models for recommendation},
  author={Ren, Xubin and Wei, Wei and Xia, Lianghao and Su, Lixin and Cheng, Suqi and Wang, Junfeng and Yin, Dawei and Huang, Chao},
  booktitle={Proceedings of the ACM on Web Conference 2024},
  pages={3464--3475},
  year={2024}
}

Thanks for your interest in our work!