This repo provides a benchmark data processing and evaluation for Multi-task and Multi-expert Learning Challenge from Noah's Ark Recommendation & Search Lab.
AliExpress Dataset: AliExpress dataset collects user logs from real-world traffic in AliExpress. It contains two tasks -- CTR prediction and CVR prediction.
To process the data, download original data, update the corresponding data path, and run
python process_aliexpress.py --dataset_name NL
AUC score for CTR, CVR, and CTCVR.
We train baseline models on each of the four datasets (NL, ES, FR, US), early stopping by CVR. We conduct experiments three times and report the results as follows. For details of the baseline implementation, please check this nice multi-task library.
NL | ES | FR | US | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Model Name | CTR | CVR | CTCVR | CTR | CVR | CTCVR | CTR | CVR | CTCVR | CTR | CVR | CTCVR |
DNN | 0.7203 | 0.7815 | 0.8556 | 0.7252 | 0.8141 | 0.8832 | 0.7174 | 0.8071 | 0.8702 | 0.7058 | 0.8068 | 0.8637 |
MMOE | 0.7195 | 0.7870 | 0.8574 | 0.7269 | 0.8268 | 0.8899 | 0.7226 | 0.8144 | 0.8748 | 0.7053 | 0.8069 | 0.8639 |
PLE | 0.7268 | 0.7843 | 0.8571 | 0.7268 | 0.8206 | 0.8861 | 0.7252 | 0.8084 | 0.8679 | 0.7092 | 0.8175 | 0.8699 |
ESMM | 0.7202 | 0.7827 | 0.8606 | 0.7263 | 0.8231 | 0.8891 | 0.7222 | 0.8078 | 0.8684 | 0.7035 | 0.8179 | 0.8712 |
AITM | 0.7256 | 0.7874 | 0.8586 | 0.7270 | 0.8221 | 0.8829 | 0.7216 | 0.8127 | 0.8710 | 0.7019 | 0.8219 | 0.8655 |
MMOE+MGDA | 0.7225 | 0.7846 | 0.8579 | 0.7264 | 0.8213 | 0.8862 | 0.7218 | 0.8101 | 0.8705 | 0.7051 | 0.8142 | 0.8668 |
MMOE+NashMTL | 0.7229 | 0.7852 | 0.8583 | 0.7267 | 0.8228 | 0.8868 | 0.7227 | 0.8107 | 0.8705 | 0.7050 | 0.8157 | 0.8675 |
PLE+MGDA | 0.7236 | 0.7848 | 0.8585 | 0.7266 | 0.8220 | 0.8862 | 0.7227 | 0.8099 | 0.8697 | 0.7049 | 0.8174 | 0.8682 |
PLE+NashMTL | 0.7230 | 0.7849 | 0.8588 | 0.7266 | 0.8223 | 0.8863 | 0.7222 | 0.8102 | 0.8700 | 0.7041 | 0.8174 | 0.8678 |
*Participants are encouraged to share the model checkpoints for us to verify the claimed results.