RowitZou / topic-dialog-summ

AAAI-2021 paper: Topic-Oriented Spoken Dialogue Summarization for Customer Service with Saliency-Aware Topic Modeling.
MIT License
77 stars 9 forks source link

RL调整前后的差异 #25

Closed lulia0228 closed 2 years ago

lulia0228 commented 2 years ago

没有RL调整之前模型预测:

ROUGE-F(1/2/l): 40.50/27.09/33.80 ROUGE-R(1/2/l): 41.76/27.80/38.03 ROUGE-P(1/2/l): 45.25/31.38/40.89

加了RL训练之后模型预测:

ROUGE-F(1/2/l): 40.03/26.05/33.08 ROUGE-R(1/2/l): 41.82/27.30/37.91 ROUGE-P(1/2/l): 45.08/30.44/40.57

想请教下作者,首先这个现象我理解是可能出现的,所以这个还是调参的问题吗?

RowitZou commented 2 years ago

因为RL是直接用 ROUGE 作为 reward 进行优化,理论上能使得生成的结果(基于rouge的性能)进一步提高。不过RL较难优化,学习率和采样方案对结果影响较大。