-
Hello!
I will make some mistakes after modifying the training model
For example, xlnet model: AttributeError: 'XLNetModel' object has no attribute 'output_hidden_states'
And tinybert model: Val…
-
Thanks for the awesome work on [AutoTinyBERT](https://aclanthology.org/2021.acl-long.400.pdf)!
We would like to use your final model checkpoints. However, the links provided in the [AutoTinyBERT …
-
Nice work! I have two questions: 1) why report the GLUE dev set results only? 2) Some strong baselines are not compared, such as NasBERT BERT-EMD.
-
**Is your feature request related to a problem? Please describe.**
With the new flexible Pipelines introduced in https://github.com/deepset-ai/haystack/pull/596, we can build way more flexlible and c…
-
## 🚀 Feature Request
We should be able to retrieve the attention weights for any layer in Transformer, not only the last one.
### Motivation
Currently the Transformer Decoder can return the wei…
-
Distilling RoBERTa using the approach described in the TinyBERT paper. The results of #2019 suggest that it makes more sense to proceed with a base model of RoBERTa. The Pile dataset can be used for t…
-
As a next step to distilling better language models, we want to explore the difference between a distilling from a base model and a large model.
For this, we would need to decide on a dataset:
- E…
-
-
**Additional context**
Seeing as updating the embeddings in dense models is computational and time expensive, I was thinking of the feasibility of this approach. If I index new documents I don't need …
-
Hi,
thanks for providing this training code and the pretrained model. But how do you load the model in pytorch? In your test.py you only do tests on tinybert, roberts, etc but don't load EfficientBer…