有个疑问，关于ReLM的适用领域

yongzhuo commented 4 months ago

有个疑问，关于ReLM的适用领域。我尝试了一下，relm模型在ECSpell、Lemon数据集上能达到论文中实验结果，但是自己跑wang271k+sighan2015的时候，指标如下，

Sentence Level detection: acc:0.8227, precision:0.7391, recall:0.7645, f1:0.7516
Sentence Level correction: acc:0.8164, precision:0.7268, recall:0.7518, f1:0.7391

效果不太好，我看论文也没有测试sighan数据集，是不是relm更适合zero-shot、专业领域小规模数据集，而不太适合训练数据充足的情况?

Claude-Liu commented 4 months ago

LEMON的训练数据是3400万条应该是远大于wang271k+sighan2015中的271k的。

----- 原始邮件 ----- 发件人: "Macropodus" @.> 收件人: "Claude-Liu/ReLM" @.> 抄送: "Subscribed" @.***> 发送时间: 星期三, 2024年 6 月 26日上午 9:05:36 主题: [Claude-Liu/ReLM] 有个疑问，关于ReLM的适用领域 (Issue #5)

有个疑问，关于ReLM的适用领域。我尝试了一下，relm模型在ECSpell、Lemon数据集上能达到论文中实验结果，但是自己跑wang271k+sighan2015的时候，指标如下，

Sentence Level detection: acc:0.8227, precision:0.7391, recall:0.7645, f1:0.7516
Sentence Level correction: acc:0.8164, precision:0.7268, recall:0.7518, f1:0.7391

效果不太好，我看论文也没有测试sighan数据集，是不是relm更适合zero-shot、专业领域小规模数据集，而不太适合训练数据充足的情况?

-- Reply to this email directly or view it on GitHub: https://github.com/Claude-Liu/ReLM/issues/5 You are receiving this because you are subscribed to this thread.

Message ID: @.***>

yongzhuo commented 4 months ago

嗯嗯，我看论文只微调了ECSpell数据集，以及zeroshot的sighan、lemon。请问你们有试过微调sighan131415的数据集吗，我确认下我跑的结果正常不正常。

Claude-Liu commented 3 months ago

我们有微调过sighan，和你的结果一致。我们认为ecspell和lemon的结果更有说服力。因为SIGHAN数据集存在训练和测试集错误对重合，繁简转换后语言风格和简中差距仍然较大等问题。这些想法在Rethinking Masked Language Modeling for Chinese Spelling Correction 和 CSCD-IME: Correcting Spelling Errors Generated by Pinyin IME 都有提及。

----- 原始邮件 ----- 发件人: "Macropodus" @.> 收件人: "Claude-Liu/ReLM" @.> 抄送: "Claude-Liu" @.>, "Comment" @.> 发送时间: 星期四, 2024年 6 月 27日上午 10:08:24 主题: Re: [Claude-Liu/ReLM] 有个疑问，关于ReLM的适用领域 (Issue #5)

嗯嗯，我看论文只微调了ECSpell数据集，以及zeroshot的sighan、lemon。请问你们有试过微调sighan131415的数据集吗，我确认下我跑的结果正常不正常。

-- Reply to this email directly or view it on GitHub: https://github.com/Claude-Liu/ReLM/issues/5#issuecomment-2194059166 You are receiving this because you commented.

Message ID: @.***>

yongzhuo commented 3 months ago

get，谢谢你的解惑

Claude-Liu / ReLM

有个疑问，关于ReLM的适用领域 #5