This is our implementation for the paper:
Shaoyun Shi, Weizhi Ma, Min Zhang, Yongfeng Zhang, Xinxing Yu, Houzhi Shan, Yiqun Liu, and Shaoping Ma. 2020. Beyond User Embedding Matrix: Learning to Hash for Modeling Large-Scale Users in Recommendation In SIGIR'20.
Please cite our paper if you use our codes. Thanks!
Author: Shaoyun Shi (shisy13 AT gmail.com)
@inproceedings{shi2020prehash,
title={Beyond User Embedding Matrix: Learning to Hash for Modeling Large-Scale Users in Recommendation},
author={Shaoyun Shi, Weizhi Ma, Min Zhang, Yongfeng Zhang, Xinxing Yu, Houzhi Shan, Yiqun Liu, and Shaoping Ma},
booktitle={Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval},
year={2020},
page={319--328},
organization={ACM}
}
Python 3.7.6
Packages: See in requirements.txt
pathos==0.2.5
tqdm==4.42.1
numpy==1.18.1
torch==1.1.0
pandas==1.0.1
scikit_learn==0.23.1
The processed datasets can be downloaded from Tsinghua Cloud or Google Drive.
You should place the datasets in the ./dataset/
. The tree structure of directories should look like:
.
├── dataset
│ ├── Books-1-1
│ ├── Grocery-1-1
│ ├── Pet-1-1
│ ├── RecSys2017-1-1
│ └── VideoGames-1-1
└── src
├── data_loaders
├── data_processors
├── datasets
├── models
├── runners
└── utils
Amazon Datasets: The origin dataset can be found here.
RecSys2017 Dataset: The origin dataset can be found here.
The codes for processing the data can be found in ./src/datasets/
./command/command.py
# PreHash enhanced BiasedMF on Grocery dataset
> cd PreHash/src/
> python main.py --model_name PreHash --dataset Grocery-1-1 --rank 1 --metrics ndcg@10,precision@1 --lr 0.001 --l2 1e-7 --train_sample_n 1 --hash_u_num 1024 --sparse_his 0 --max_his 10 --sup_his 1 --random_seed 2018 --gpu 0