-
I'm trying to reproduce TernaryBERT RTE, CoLA Performance(Table 6) using this source code.
But I've noticed that this TernaryBERT Repo does not offer any task specific fine-tuned model file for…
-
-
**Describe the bug**
I am trying to convert TinyBert model to ONNX. In the process, with latest tf2onnx: it was able to decompose einsum operators with equations: `aecd,abcd→acbe, acbe,aecd→abcd` t…
-
No such file or directory: 'data/wikitables_v2/entity_embedding_tinybert_312.pkl'
-
https://github.com/PaddlePaddle/PaddleNLP/blob/develop/examples/model_compression/tinybert/imgs/tinybert.png 已经丢失,请修补一下
-
Hi, Thanks for this great source code. It really helps me a lot!
While I'm studying the TernaryBERT with Paper and this source code, I have a question about KD Training Loss.
In Paper Algorithm1, …
-
## Environment info
- `transformers` version: 3.0.2
- Platform: Darwin-20.3.0-x86_64-i386-64bit
- Python version: 3.6.10
- PyTorch version (GPU?): 1.7.1 (False)
- Tensorflow version (GPU?): 2…
-
请问可以分享下general distillation后的中文TinyBERT模型吗,非常感谢!
-
不知道您是否考虑发布中文General_TinyBERT模型?
-
假设teacher和student的hidden_size分别为d和d'
当d不等于d'时,利用student模型的fit_dense层,将d‘映射到和d一样的维度,使得student和teacher之间可以计算hidden_state loss。
但是当d和d'像当时,就可以不经过fit_dense映射直接计算hidden_state loss吧。但是代码里用了
`if is_studen…