Multi-Task Multi-Expert Challenge

This repo provides a benchmark data processing and evaluation for Multi-task and Multi-expert Learning Challenge from Noah's Ark Recommendation & Search Lab.

Data

AliExpress Dataset: AliExpress dataset collects user logs from real-world traffic in AliExpress. It contains two tasks -- CTR prediction and CVR prediction.

To process the data, download original data, update the corresponding data path, and run

python process_aliexpress.py --dataset_name NL

Evaluation

AUC score for CTR, CVR, and CTCVR.

Baseline Results

We train baseline models on each of the four datasets (NL, ES, FR, US), early stopping by CVR. We conduct experiments three times and report the results as follows. For details of the baseline implementation, please check this nice multi-task library.

	NL			ES			FR			US
Model Name	CTR	CVR	CTCVR	CTR	CVR	CTCVR	CTR	CVR	CTCVR	CTR	CVR	CTCVR
DNN	0.7203	0.7815	0.8556	0.7252	0.8141	0.8832	0.7174	0.8071	0.8702	0.7058	0.8068	0.8637
MMOE	0.7195	0.7870	0.8574	0.7269	0.8268	0.8899	0.7226	0.8144	0.8748	0.7053	0.8069	0.8639
PLE	0.7268	0.7843	0.8571	0.7268	0.8206	0.8861	0.7252	0.8084	0.8679	0.7092	0.8175	0.8699
ESMM	0.7202	0.7827	0.8606	0.7263	0.8231	0.8891	0.7222	0.8078	0.8684	0.7035	0.8179	0.8712
AITM	0.7256	0.7874	0.8586	0.7270	0.8221	0.8829	0.7216	0.8127	0.8710	0.7019	0.8219	0.8655
MMOE+MGDA	0.7225	0.7846	0.8579	0.7264	0.8213	0.8862	0.7218	0.8101	0.8705	0.7051	0.8142	0.8668
MMOE+NashMTL	0.7229	0.7852	0.8583	0.7267	0.8228	0.8868	0.7227	0.8107	0.8705	0.7050	0.8157	0.8675
PLE+MGDA	0.7236	0.7848	0.8585	0.7266	0.8220	0.8862	0.7227	0.8099	0.8697	0.7049	0.8174	0.8682
PLE+NashMTL	0.7230	0.7849	0.8588	0.7266	0.8223	0.8863	0.7222	0.8102	0.8700	0.7041	0.8174	0.8678

*Participants are encouraged to share the model checkpoints for us to verify the claimed results.

NoahRSChallenge / Multi-Task-Multi-Expert

readme

Multi-Task Multi-Expert Challenge

Data

Evaluation

Baseline Results