如何得到数据集经过syntax-guided encoder编码后的句法向量？

wrq9 commented 1 month ago

您好，请问如果我只想得到将数据集中的数据经过GOPar解析，并用syntax-guided encoder编码后的向量，应该如何操作？

HillZhang1999 commented 1 month ago

可以hack一下fairseq的代码，比如在这里拿到： https://github.com/HillZhang1999/SynGEC/blob/main/src/src_syngec/syngec_model/syntax_enhanced_transformer.py#L562

wrq9 commented 1 month ago

可以hack一下fairseq的代码，比如在这里拿到： https://github.com/HillZhang1999/SynGEC/blob/main/src/src_syngec/syngec_model/syntax_enhanced_transformer.py#L562

感谢回复，还想请教一下代码中的src_outcoming_arc_mask, src_incoming_arc_mask, src_dpd_matrix, src_probs_matrix这些是如何得到的？

wrq9 commented 1 month ago

您好，当我使用emnlp2022_syngec_biaffine-dep-electra-zh-gopar时解析后的标签能正确得到M(Missing errors), R(Redundant errors), S(Substituted errors)，而使用char时却无法得到M, R, S这三个标签，请问是什么原因？

import os
os.environ["HF_ENDPOINT"] = "https://hf-mirror.com"
from supar import Parser

# path = '../emnlp2022_syngec_biaffine-dep-electra-zh-char'
path = '../emnlp2022_syngec_biaffine-dep-electra-zh-gopar'
dep = Parser.load(path)

tree = dep.predict("今天是星期。", verbose=False, buckets=32, prob=True)
print(f'arcs: {tree.arcs[0]}')
print(f'rels: {tree.rels[0]}')

使用gopar的输出：

arcs: [2, 3, 0, 5, 3, 3]
rels: ['app', 'top', 'root', 'app', 'attr', 'M']

使用char的输出：

arcs: [2, 3, 0, 5, 3, 3]
rels: ['app', 'top', 'root', 'app', 'attr', 'punct']

HillZhang1999 / SynGEC

如何得到数据集经过syntax-guided encoder编码后的句法向量？ #37