agiresearch / OpenP5

OpenP5: An Open-Source Platform for Developing, Training, and Evaluating LLM-based Recommender Systems
Apache License 2.0
254 stars 20 forks source link

OpenP5: An Open-Source Platform for Developing, Training, and Evaluating LLM-based Recommender Systems

Introduction

This repo presents OpenP5, an open-source platform for LLM-based Recommendation development, finetuning, and evaluation.

Paper: OpenP5: An Open-Source Platform for Developing, Training, and Evaluating LLM-based Recommender Systems
Paper link: https://arxiv.org/pdf/2203.13366.pdf

A relevant repo regarding how to create item ID for recommendation foundation models is available here:

Paper: How to Index Item IDs for Recommendation Foundation Models
Paper link: https://arxiv.org/pdf/2305.06569.pdf
GitHub link: https://github.com/Wenyueh/LLM-RecSys-ID

News

-[2024.05.27] OpenP5 version 2.0 released!

-[2023.12.20] We have made the first release of the project under the release-1.0 branch, which is also provided as the release-1.0 under the Release section of the project. This is a complete and readily executable branch that can help you to quickly get things running and do experiments for both T5 and LLaMA backbones. However, these two backbones are implemented as two separate python files. Currently, we are further refactoring the code to make T5 and LLaMA backbones compatible in the same codebase structure, and we will make the second release once that is finished.

-[2023.9.16] OpenP5 now supports both T5 and LLaMA-2 backbone LLMs.

-[2023.6.10] OpenP5 now supports 10 datasets and 3 item ID indexing methods for both sequential recommendation and straightforward recommendation tasks.

Environment

Environment requirements can be found in ./environment.txt

Data Statistics

The statistics of the selected ten datasets can be found below:

Datasets ML-1M Yelp LastFM Beauty ML-100K
#Users 6,040 277,631 1,090 22,363 943
#Items 3,416 112,394 3,646 12,101 1,349
#Interactions 999,611 4,250,483 52,551 198,502 99,287
Sparsity 95.16\% 99.99\% 98.68\% 99.93\% 92.20\%
Datasets Clothing CDs Movies Taobao Electronics
#Users 39,387 75,258 123,960 6,104 192,403
#Items 23,033 64,443 50,052 4,192 63,001
#Interactions 278,677 1,697,533 1,697,533 46,337 1,689,188
Sparsity 99.97\% 99.96\% 99.97\% 99.82\% 99.99\%

Usage

Download the data from Google Drive link, and put them into ./data folder.

Run the following command to generate all data

sh generate_dataset.sh

The training command can be found in ./command folder. Run the command such as

cd command
sh ML1M_t5_sequential.sh

Checkpoint

The evaluation command can be found in ./test_command folder. Run the command such as

cd ./test_command
sh ML1M_t5_sequential.sh

Citation

@article{xu2024openp5,
  title={OpenP5: An Open-Source Platform for Developing, Training, and Evaluating LLM-based Recommender Systems},
  author={Shuyuan Xu and Wenyue Hua and Yongfeng Zhang},
  journal={SIGIR},
  year={2024}
}
@article{hua2023index,
  title={How to Index Item IDs for Recommendation Foundation Models},
  author={Hua, Wenyue and Xu, Shuyuan and Ge, Yingqiang and Zhang, Yongfeng},
  journal={SIGIR-AP},
  year={2023}
}