HillZhang1999 / SynGEC

Code & data for our EMNLP2022 paper "SynGEC: Syntax-Enhanced Grammatical Error Correction with a Tailored GEC-Oriented Parser"
https://arxiv.org/abs/2210.12484
MIT License
79 stars 14 forks source link

如何得到数据集经过syntax-guided encoder编码后的句法向量? #37

Open wrq9 opened 1 month ago

wrq9 commented 1 month ago

您好,请问如果我只想得到将数据集中的数据经过GOPar解析,并用syntax-guided encoder编码后的向量,应该如何操作?

HillZhang1999 commented 1 month ago

可以hack一下fairseq的代码,比如在这里拿到: https://github.com/HillZhang1999/SynGEC/blob/main/src/src_syngec/syngec_model/syntax_enhanced_transformer.py#L562

wrq9 commented 1 month ago

可以hack一下fairseq的代码,比如在这里拿到: https://github.com/HillZhang1999/SynGEC/blob/main/src/src_syngec/syngec_model/syntax_enhanced_transformer.py#L562

感谢回复,还想请教一下代码中的src_outcoming_arc_mask, src_incoming_arc_mask, src_dpd_matrix, src_probs_matrix这些是如何得到的?

wrq9 commented 1 month ago

您好,当我使用emnlp2022_syngec_biaffine-dep-electra-zh-gopar时解析后的标签能正确得到M(Missing errors), R(Redundant errors), S(Substituted errors),而使用char时却无法得到M, R, S这三个标签,请问是什么原因?

import os
os.environ["HF_ENDPOINT"] = "https://hf-mirror.com"
from supar import Parser

# path = '../emnlp2022_syngec_biaffine-dep-electra-zh-char'
path = '../emnlp2022_syngec_biaffine-dep-electra-zh-gopar'
dep = Parser.load(path)

tree = dep.predict("今天是星期。", verbose=False, buckets=32, prob=True)
print(f'arcs: {tree.arcs[0]}')
print(f'rels: {tree.rels[0]}')

使用gopar的输出:

arcs: [2, 3, 0, 5, 3, 3]
rels: ['app', 'top', 'root', 'app', 'attr', 'M']

使用char的输出:

arcs: [2, 3, 0, 5, 3, 3]
rels: ['app', 'top', 'root', 'app', 'attr', 'punct']