HillZhang1999 / SynGEC

Code & data for our EMNLP2022 paper "SynGEC: Syntax-Enhanced Grammatical Error Correction with a Tailored GEC-Oriented Parser"
https://arxiv.org/abs/2210.12484
MIT License
79 stars 14 forks source link

The Difference between Baseline and Zhang et al.(2022) #22

Closed GMago-LeWay closed 11 months ago

GMago-LeWay commented 1 year ago

Hi, I noticed that in the SynGEC paper, there is a confusing point in the results on Chinese datasets. What's the difference between the Baseline (using PLM) and the Zhang et al. method? It seems that the two results are generated by the same method using BART model. Or it is the fairseq implementation that makes it different?

HillZhang1999 commented 1 year ago

Please refer to Sec.5 Use of BART. The main improvement comes from the upgrade of vocabulary.

boxiaowave commented 1 year ago

Please refer to Sec.5 Use of BART. The main improvement comes from the upgrade of vocabulary.

请问下这部分词汇升级的代码在哪呢?似乎没看到这部分词表

GMago-LeWay commented 12 months ago

在论文第5部分,您提到了从Chinese Gigaword和wiki添加了3866个汉字以及标点符号,请问这一部分在repo中有代码对应吗?很好奇这一部分是怎么做的。 以及表格6中中文数据集上的单模型实验结果,PLM选项的意思是指”是否使用BART原始的权重“吗?如果是这样,是否有不改变词表下SynGEC方法的结果呢?

GMago-LeWay commented 11 months ago

@HillZhang1999 请问您有空的话可以回复一下吗?谢谢

HillZhang1999 commented 11 months ago

不改变词表情况下,效果可以参考MuCGEC那篇论文。词表扩展的代码目前可能找不太到了,请谅解。

GMago-LeWay commented 11 months ago

不改变词表情况下,效果可以参考MuCGEC那篇论文。词表扩展的代码目前可能找不太到了,请谅解。

好的,谢谢