Identify the algorithms of the typical tasks and their performances on the major datasets.

coco-lab-2022 commented 2 years ago

Please create a table to include the recent algorithms and their performances on the major datasets.

gezi-creator commented 2 years ago

The algorithms of the CWS and their performances on darasets 1

Segment, Mask, and Predict: Augmenting Chinese Word Segmentation with Self-Supervision. EMNLP(2021) method: Propose a self-supervised method for CWS dependency parsing datasets: MSRA、PKU、AS、CITYU、CTB、SXU、CNC、UDC、ZX code: https://github.com/miradel51/Self_Supervised_CWS paper: https://aclanthology.org/2021.emnlp-main.158.pdf
A Graph-based Model for Joint Chinese Word Segmentation and Dependency Parsing. ACL(2020) method: Propose a graph-based model for joint Chinese word segmentation and dependency parsing datasets: CTB5, CTB7, CTB9 code: https://github.com/fastnlp/JointCwsParser paper: https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00301/43541/A-Graph-based-Model-for-Joint-Chinese-Word
Joint Chinese Word Segmentation and Part-of-speech Tagging via Two-way Attentions of Auto-analyzed Knowledge. ACL(2020) method: Propose neural approach with a two-way attention mechanism to incorporate autoanalyzed knowledge for joint CWS and POS tagging, following a character-based sequence labeling paradigm. datasets:CTB5、CTB6、CTB7、CTB9、UD code: https://github.com/SVAIGBA/TwASP. paper: https://aclanthology.org/2020.acl-main.735.pdf
Improving Chinese Word Segmentation with Wordhood Memory Networks. ACL(2020) method: Propose WMSEG, a neural framework for CWS using wordhood memory networks. datasets: MSR、PKU、AS、CITYU、CTB6 code: https://github.com/SVAIGBA/WMSeg. paper: 2020.acl-main.734v2.pdf (aclanthology.org)
BERT+LTL

A joint multiple criteria model in transfer learning for cross-domain chinese word segmentation. EMNLP(2020) Kaiyu Huang, Degen Huang, Zhuang Liu, and Fengran Mo

datasets: MSRA、PKU、CTB、SXU、CNC、UDC、ZX code: https://github.com/koukaiu/dlut-nihao paper: https://aclanthology.org/2020.emnlp-main.318.pdf

SELFATT+SOFT Attention is all you need for chinese word segmentation. EMNLP(2020) Sufeng Duan, Hai Zhao

datasets: PKU、AS、CITYU、MSR code: https://github.com/akibcmi/SAMS paper: https://arxiv.org/pdf/1910.14537.pdf

BERT BERT: pre-training of deep bidirectional transformers for language understanding. ACL(2019) Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova

code: https://github.com/google-research/bert. paper: https://arxiv.org/pdf/1810.04805.pdf

LSTM+CRF State-of-the-art chinese word segmentation with Bi-LSTMs. EMNLP(2018) Ji Ma, Kuzman Ganchev, David Weiss

datasets: PKU、AS、CITYU、MSR、CTB6、CTB7、UD code: NONE paper: https://arxiv.org/pdf/1808.06511.pdf

LSTM+BEAM Fast and accurate neural word segmentation for chinese. ACL(2017) Deng Cai, Hai Zhao, Zhisong Zhang, Yuan Xin, Yongjian Wu, Feiyue Huang

datasets: PKU、AS、CITYU、MSR code: https://github.com/jcyk/greedyCWS. paper: https://arxiv.org/pdf/1704.07047.pdf

weiwang310 commented 2 years ago

For each task, can you upload a table for results comparison at the same time?

By the way, can you mention whether these methods have codes or not? @gezi-creator

gezi-creator commented 2 years ago

The algorithms of the CWS and their performances on darasets 2

Federated Chinese Word Segmentation with Global Character Associations. ACL(2021) Yuanhe Tian , Guimin Chen, Han Qin, Yan Song method: Propose a self-supervised method for CWS dependency parsing datasets:CTB7 code: https://github.com/cuhksz-nlp/GCASeg (none now) paper: https://aclanthology.org/2021.findings-acl.376.pdf

CocoLab-2022 / cnnlp-traditionallingustics-enhancement

Identify the algorithms of the typical tasks and their performances on the major datasets. #2

The algorithms of the CWS and their performances on darasets 1

The algorithms of the CWS and their performances on darasets 2