Source code and dataset for EMNLP 2020 paper "MAVEN: A Massive General Domain Event Detection Dataset".
The dataset (ver. 1.0) can be obtained from Tsinghua Cloud or Google Drive. The data format is introduced in this document.
We also release the document topics for data analysis and model development. The docid2topic.json
is to map the document ids to their EventWiki topic labels.
To get the test results, you can submit your predictions to our permanent CodaLab competition (the older version will be phased out soon). For the evaluation method, please refer to the evaluation script.
We release the source codes for the baselines, including DMCNN, BiLSTM, BiLSTM+CRF, MOGANED and DMBERT.
If these data and codes help you, please cite this paper.
@inproceedings{wang2020MAVEN,
title={{MAVEN}: A Massive General Domain Event Detection Dataset},
author={Wang, Xiaozhi and Wang, Ziqi and Han, Xu and Jiang, Wangyi and Han, Rong and Liu, Zhiyuan and Li, Juanzi and Li, Peng and Lin, Yankai and Zhou, Jie},
booktitle={Proceedings of EMNLP 2020},
year={2020}
}