[Question]: mbart模型如何做蒸馏

PaddlePaddle / PaddleNLP

👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.

https://paddlenlp.readthedocs.io

Apache License 2.0

12.05k stars 2.93k forks source link

[Question]: mbart模型如何做蒸馏 #4350

Closed Amy234543 closed 1 year ago

Amy234543 commented 1 year ago

请提出你的问题

微信图片_20230105134604 微信图片_20230105134554

仿照bert的例子，只计算了学生和教师logits的均方误差，训练了一轮loss也不下降，是什么原因呢？帮忙看下我写的代码有什么问题吧。学生模型是我改小了教师模型model_config中的层数后微调好的模型。

LiuChiachi commented 1 year ago

先说一个我发现的已知的问题吧，teacher模型不需要计算反向及更新参数，你可能需要参考这里处理下teacher模型：加个with paddle.no_grad() https://github.com/PaddlePaddle/PaddleNLP/blob/d218a25a4cefdf56cef72ecaf3886dd625668273/model_zoo/tinybert/task_distill.py#L371-L372

Amy234543 commented 1 year ago

你好，我按照你说的改了，现在的loss是下降了，但是每轮刚开始的时候会上升在下降，感觉有问题，想问下你知道是什么原因导致的吗微信图片_20230106141127 @LiuChiachi

LiuChiachi commented 1 year ago

alpha=0不会使用数据集的hard标签，是不是可以调一下alpha值？另外我看你好像没有evaluation阶段，除了loss是不是也可以看看评价指标

Amy234543 commented 1 year ago

alpha我调成0.5了，我想问下蒸馏训练前，学生模型需不需要训练好，因为我是直接减少的教师模型的层数，不训练，蒸馏前学生模型是不能预测的，应该训练好学生模型再去做蒸馏训练吗 @LiuChiachi

github-actions[bot] commented 1 year ago

This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动，被标记为stale。

github-actions[bot] commented 1 year ago

This issue was closed because it has been inactive for 14 days since being marked as stale. 当前issue 被标记为stale已有14天，即将关闭。