-
Hi,
thanks for providing this training code and the pretrained model. But how do you load the model in pytorch? In your test.py you only do tests on tinybert, roberts, etc but don't load EfficientBer…
-
With #1873 and #1874 we implemented loss functions and data augmentation for knowledge distillation based on the TinyBERT paper. This issue here is about training distilled models with that technique …
-
Excuse me, after the author uses the enhanced training set to distill, does the inference speed increase by 8 times as fast as the paper said? My test found that the single line is basically the same …
-
**Is your feature request related to a problem? Please describe.**
A basic version of model distillation was implemented with #1758. However, there is still room for improvement. The TinyBERT paper (…
-
hi,蒸馏过程中,用做数据增强吗?数据增强会提高分类任务的精度吗?
-
Hi, I am trying to test dynamic batching with pytorch. I get the following error:
```
Traceback (most recent call last):
File "trace_msmarco.py", line 52, in
results = model_neuron(*infer…
-
Hi Huawei team:
Sorry to disturb you, can you guys answer my following question?
Why did the training pipeline of TinyBert "general_distill.py" not use DDP to initialize the student model, inste…
-
How can i pass a custom model url (say one optimized for inference for onnx / tensorRT).
-
Hello,
https://github.com/huawei-noah/Pretrained-Language-Model/blob/54ca698e4f907f32a108de371a42b76f92e7686d/TinyBERT/data_augmentation.py#L147-L154
In line 154, the tokenized text is sliced to…
-
According to the code, only the teacher model is distributedly computed:
https://github.com/huawei-noah/Pretrained-Language-Model/blob/a8a705e9c8c952e078b45d1091d3f0ed161483d8/TinyBERT/general_distil…