-
Hi,
I found in MultiHeadedAttention, thop only count the FLOPS of linear layer, missing the attention operation.
-
你好,我在复现您的实验(没有进行任何修改)的时候在主干网络的训练时准确率是逐渐提高的,在蒸馏阶段验证集和测试集的acc每一个epoch都和主干网络的最后一个epoch相同,请问是我哪里出错了吗?
-
How can i use the confusion matrix for each class and the other metrics in this link https://github.com/kaushaltrivedi/fast-bert/issues/17 ??
-
I tried following this tutorial. https://medium.com/@kaushaltrivedi/train-and-deploy-mighty-transformer-nlp-models-using-fastbert-and-aws-sagemaker-cc4303c51cf3
The training fails, please see belo…
-
When i train a fastbert model and save it using save_and_reload(), the model output is not consistent with the models output before saving.
code to reproduce:
```
from fast_bert import BertClas…
-
你好,我在复现论文效果时遇到两个问题,请教一下。
1. 当我训练子分类器时,得到的效果没有直接用true label训练效果好;
2. 最终推理时,我在CPU上得到了11x的速度提升,但是GPU上只有2x。
下面是我分享复现时的细节,并非全部与所问问题相关:
- 我用的是中文二分类数据集,40w作为训练集,3w作为测试集,后面的效果都是在测试集上得出的;
- teacher分类器和s…
-
The existing tokenizer implementation supports only GPT models. The [Microsoft.ML.Tokenizers](https://www.nuget.org/packages/Microsoft.ML.Tokenizers/0.21.0-preview.22621.2) package provides a …
-
Hello!
Could you please share some details about the T5 paraphraser baseline? Namely, which model was used -- the original or the one fine-tuned on the subset of ParaNMT? And what parameters for ge…
-
-
Hi, I want to know how your compiler deal with control flow. For example FastBert, which need dynamically exit. Or for some pipeline relation extraction model, which the number of token-pair are chang…