acbull / Unbiased_LambdaMart

Code for WWW'19 "Unbiased LambdaMART: An Unbiased Pairwise Learning-to-Rank Algorithm", which is based on LightGBM
MIT License
224 stars 50 forks source link
bias learning-to-rank lightgbm

Unbiased LambdaMart

Unbiased LambdaMart is a unbiased version of traditional LambdaMart, which can jointly estimate the biases at click positions and the biases at unclick positions, and learn an unbiased ranker using a pairwise loss function.

The repository contains two parts, firstly an implementation of Unbiased LambdaMart based on LightGBM. Secondly a simulated click dataset with its generation scripts for evalution.

You can see our WWW 2019 (know as The Web Conference) paper Unbiased LambdaMART: An Unbiased PairwiseLearning-to-Rank Algorithm for more details.

Overview

Setup

First compile the Unbias_LightGBM (Original LightGBM with the implementation of Unbiased LambdaMart)

On Linux LightGBM can be built using CMake and gcc or Clang.

Install CMake with sudo apt install cmake.

Run the following commands:

cd Unbias_LightGBM/
mkdir build ; cd build
cmake ..
make -j4

Note: glibc >= 2.14 is required. After compilation, we will get a "lighgbm" executable file in the folder.

Example

We modified the original example file to give an illustration.

Compile, then run the following commands:

cd Unbias_LightGBM
cp ./lightgbm ./examples/lambdarank/
cd ./examples/lambdarank/
./lightgbm config="train.conf"

Despite the original XXX.train (provides feature) and XXX.train.query (provides which query a document belongs to), our modified lambdamart requires a XXX.train.rank file to provide the position information to conduct debiasing. For later usage, remember to add this file.

Evaluation

Firstly, download the ranked dataset by an initial SVM ranker from HERE and unzip it into the evaluation directory. Also, one can generate this from scratch by their own, by refering to the procedure of Qingyao Ai, et al..

Next, generate the synthetic dataset from click models by:

cd evaluation
mkdir test_data
cd scripts
python generate_data.py ../click_model/user_browsing_model_0.1_1_4_1.json

Their are also other click model configurations in evaluation/click_model/, one can use any of them.

Finally, move the compiled lighgbm file into evaluation/configs, and then run:

./lightgbm config='train.conf'
./lightgbm config='test.conf'

In this way, the test results (LightGBM_predict_result.txt) based on synthetic click data will be generated. Next, we will evaluate it on real data, by:

cd ../scripts
python eval.py ../configs/LightGBM_predict_result.txt  #or any other model output.

Citation

Please consider citing the following paper when using our code for your application.

@inproceedings{unbias_lambdamart,
  title={Unbiased LambdaMART: An Unbiased Pairwise Learning-to-Rank Algorithm},
  author={Ziniu Hu, Yang Wang, Qu Peng, Hang Li},
  booktitle={Proceedings of the 2019 World Wide Web Conference},
  year={2019}
}