-
The current disadvantage of doing NER for large models is that they cannot achieve the effect of fine-tuning BERT. Is there any way to solve it. For example, through prompt words and so on. If the lar…
-
I have seen the new BERT-related changes and I'm trying to use this model. Will be the documentation updated with the BERT parameters and an example of pre-training or fine-tuning?
-
https://arxiv.org/abs/2006.04884
# Background
- Fine-tuning BERT, RoBERTa, and ALBERT are not fully known.
- Varying only the random seed leads to a large standard deviation of the fin-tuning acc…
-
很感谢您的工作,非常清晰。
想请教一个问题,是否有对比以下两种情况的表现呢:
1. 加载bert权重,在domain-spe数据上fine-tuning,去做aspect分类
2. 加载bert权重,在domain-spe数据上fine-tuning并添加对比学习loss,去做aspect分类
-
Given the discussion about which layer keeping as a token's representation in a down-streaming analysis (Jawahar et al., 2019; Ethayarajh, 2019) when for example using a pre-trained bert model, I was …
-
WIP project roadmap for LoRAX. We'll continue to update this over time.
# v0.10
- [ ] Speculative decoding adapters
- [ ] AQLM
# v0.11
- [ ] Prefix caching
- [ ] BERT support
- [ ] Embe…
-
Currently we are using the pre-trained [Universal Sentence Encoder (large)](https://tfhub.dev/google/universal-sentence-encoder-large/5) from TensorFlow hub.
**Open area for investigation:**
The …
-
Data Augmentation for BERT Fine-Tuning in Open-Domain Question Answering
Wei Yang, Yuqing Xie, Luchen Tan, Kun Xiong, Ming Li, Jimmy Lin
https://arxiv.org/abs/1904.06652
-
Recently you updated "BERT pretrained on mixed large Chinese corpus (bert-large 24-layers) " on ReadMe. What hyperparameters (lr, batch size, max epochs) did you use when fine-tuning on CLUE?
-
Hi, authors of Bert-FP, the SOTA in Response Selection tasks. Excited to see that the post-training strategy works so well with the sub-context-response pairs. Recently, I try to reimplement this work…