-
Hello @nreimers! In section 4.3 of your paper: "Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation", you trained a student XLM-R model on JW300. Is it possible to share /…
-
Hi, thanks for your great work. Dose the code contain the feature distillation part?
-
Hi,
Thank you for releasing the distilled MINI-LM models from pre-trained Transformer models. I wonder if you have any plans to release a sample code for MINI-LM distillation implementations in eit…
-
策略二提到是借鉴了”Learning Efficient Object Detection Models with Knowledge Distillation“,我想问的是这个项目的知识蒸馏是不是没有做hint learning这一部分,仅使用的文章关于求输出层软目标损失的策略。
-
Thanks for the brilliant work! I am reading this legendary paper and get this question that I want to discuss here.
The paper starts at introducing a new method to distill knowledge from a trained …
-
- Reviewer Yydy
strength
> I like the intuition of using confidence gaps (obtained through logits only) to approximate the original private model, but there shall be more details about the inver…
-
# Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer #
- Author: Sergey Zagoruyko, Nikos Komodakis
- Origin: [http://www.gitxiv.c…
-
Hi @akshaychawla, can you give me access to the GCS data? Thanks so much.
-
最近阅读了您的这篇‘Generalization Matters: Loss Minima Flattening via Parameter Hybridization for Efficient Online Knowledge Distillation’论文,对于这篇论文,我深受启发,于是进行了复现。在复现过程中,因为imagenet数据集比较庞大,于是我使用了Tiny-imagenet数据集…
-
Hello! when trainning the Dnet, the channel between teacher featuremap and student featuremap maybe different ,how to deal with it?