Closed sugeeth14 closed 2 years ago
Hi @Raghava14 were you able to obtain a KD model for En-De.If yes how.
Hi, I'm trying to use KD for en-de translation. In the doc it said decode the training set to produce a distillation dataset
. Could you give me some hint on how to obtain this distillation dataset after I train my model?
Hi @Ir1d, this is exactly the same problem I've encountered. There's no hint how to decode the training set to produce a distillation sataset.
This issue has been automatically marked as stale. If this issue is still affecting you, please leave any comment (for example, "bump"), and we'll keep it open. We are sorry that we haven't been able to prioritize it yet. If you have any new additional information, please include it with your comment!
Closing this issue after a prolonged period of inactivity. If this issue is still present in the latest release, please create a new issue with up-to-date information. Thank you!
Hi, I trained a transformer model for English to German translation for using the instructions presented here. Now I want to train a smaller model using Knowledge distillation mentioned in this paper. Is such thing supported in Fairseq if not how do I get the logits or soft targets from my model so that I can train a smaller model and is this possible in my case. If any one has tried KD on transformer any ideas or suggestions are welcome. Thank you.