PaddlePaddle / PaddleNLP

👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.
https://paddlenlp.readthedocs.io
Apache License 2.0
12.05k stars 2.93k forks source link

[Question]: mbart模型如何做蒸馏 #4350

Closed Amy234543 closed 1 year ago

Amy234543 commented 1 year ago

请提出你的问题

微信图片_20230105134604 微信图片_20230105134554

仿照bert的例子,只计算了学生和教师logits的均方误差,训练了一轮loss也不下降,是什么原因呢?帮忙看下我写的代码有什么问题吧。 学生模型是我改小了教师模型model_config中的层数后微调好的模型。

LiuChiachi commented 1 year ago

先说一个我发现的已知的问题吧,teacher模型不需要计算反向及更新参数,你可能需要参考这里处理下teacher模型: 加个with paddle.no_grad() https://github.com/PaddlePaddle/PaddleNLP/blob/d218a25a4cefdf56cef72ecaf3886dd625668273/model_zoo/tinybert/task_distill.py#L371-L372

Amy234543 commented 1 year ago

你好,我按照你说的改了,现在的loss是下降了,但是每轮刚开始的时候会上升在下降,感觉有问题,想问下你知道是什么原因导致的吗 微信图片_20230106141127 @LiuChiachi

LiuChiachi commented 1 year ago

alpha=0不会使用数据集的hard标签,是不是可以调一下alpha值? 另外我看你好像没有evaluation阶段,除了loss是不是也可以看看评价指标

Amy234543 commented 1 year ago

alpha我调成0.5了,我想问下蒸馏训练前,学生模型需不需要训练好,因为我是直接减少的教师模型的层数,不训练,蒸馏前学生模型是不能预测的,应该训练好学生模型再去做蒸馏训练吗 @LiuChiachi

github-actions[bot] commented 1 year ago

This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动,被标记为stale。

github-actions[bot] commented 1 year ago

This issue was closed because it has been inactive for 14 days since being marked as stale. 当前issue 被标记为stale已有14天,即将关闭。