-
Thank you for the work.
I would like to run the DiffPrune with the bert-base model, but the model seems not to work. The following is the logs
[SST-2]
![image](https://user-images.githubusercon…
-
现在公开的bert是英文版的么,中文版的啥时候发布呀
luzm3 updated
4 years ago
-
1. 官方最近有更新中文版模型计划吗?
2. 有人自己训练蒸馏中文版模型吗?蒸馏模型做下游任务效果如何?
-
你好,看到论文中的公式(6),每一层的损失需要乘上当前层对应的超参数λ_m之后再求和,这里我理解的是这个λ_m是初始化的值,介于0到1之间,并且所有的λ加起来值为1。
1、请问实验中也的确是这么实现的吗?我在task_distill.py中没有看到有类似λ的变量,好像是直接把每一层的损失加起来?
2、如果在实现的时候的确有这样的λ_m的话,想问下当时是怎么初始化的?
3、初始化种子不同,对收…
-
-
bert 在Finetune 好之后作为teacher和student模型进行拟合。请问student模型哪里来?
-
# 🌟 New model addition
## Model description
TinyBERT is a smaller version of the Base BERT model, it uses transformer distillation (a type of knowledge distillation) to transfer the plenty of know…
-
Hello, I got AI Serving Server up and running:
- pulled autodeployai/ai-serving:0.9.0-cuda image,
- started server: `docker run --rm -it -v $(pwd):/opt/ai-serving -p 9090:9090 -p 9091:9091 IMAGE_ID`…
-
Hi,
first of all thanks for your interesting work!
I'm trying to replicate the distillation concepts presented in the paper on a vanilla Transformer architecture, so my question is not strictly re…
-
I have set up nboost and cuda toolkit on gpu machine but still nboost is running on cpu. Please let me know what extra configurations I need to do, so that nboost will consume gpu.