Open coco-lab-2022 opened 2 years ago
datasets: AS、PK、CITYU、MSR The Second International Chinese Word Segmentation Bakeoff(2005) Thomas Emerson paper: https://aclanthology.org/I05-3017.pdf
datasets: CTB6 The penn chinese treebank: Phrase structure annotation of a large corpus(2005) Xue N, Xia F, Chiou F D, ... paper: https://www.coli.uni-saarland.de/~tania/CMGD/site/papers/the-penn-chinese-treebank-phrase-structure-annotation-of-a-large-corpus.pdf
datasets: CITYU、CKIP、CTB、MSRA、NCC、PKU、SXU The Fourth International Chinese Language Processing Bakeoff: Chinese Word Segmentation, Named Entity Recognition and Chinese Pos Tagging(2008) Guangjin Jin, Xiao Chen paper: https://aclanthology.org/I08-4010.pdf
datasets: WTB Dependency Parsing for Weibo: An Efficient Probabilistic Logic Programming Approach(2014) William Yang Wang, Lingpeng Kong, Kathryn Mazaitis, William W. Cohen paper: https://aclanthology.org/D14-1122.pdf
datasets: ZX Type-supervised domain adaptation for joint segmentation and pos-tagging (2014) Meishan Zhang, Yue Zhang , Wanxiang Che , Ting Liu paper: https://aclanthology.org/E14-1062.pdf
datasets: UD CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies(2017) Daniel Zeman1 , Martin Popel1,... paper: https://iris.unito.it/retrieve/handle/2318/1652589/371422/K17-3001.pdf
datasets: MCWS More than Text: Multi-modal Chinese Word Segmentation. ACL(2021) Dong Zhang, Zheng Hu, Shoushan Li , Hanqian Wu, Qiaoming Zhu, Guodong Zhou method: Proposes a new dataset for multi-modal Chinese word segmentation (MCWS), datasets: https://github.com/MANLP-suda/MCWS paper: https://aclanthology.org/2021.acl-short.70.pdf
Please summarize the major datasets used in literature.