-
## Environment info
- `transformers` version: 4.18.0.dev0
- Platform: Windows-10-10.0.22000-SP0
- Python version: 3.7.11
- PyTorch version (GPU?): 1.11.0 (True)
- Tensorflow version (GPU?): …
-
The k bias is always zero in code. Is there any reason for this? This is different from the normal implement.
https://github.com/microsoft/unilm/blob/421cffe163e474189c410ee06dad48dbfdcbc135/beit/m…
-
原文用了128核的TPUv3,完全是我企及不到的算力。请问你们用了什么算力啊,我评估一下我手上的算力配不配做预训练
-
## Environment info
- `transformers` version: 4.12.5
- Platform: macOS-10.16-x86_64-i386-64bit
- Python version: 3.8.5
- PyTorch version (GPU?): 1.10.0 (False)
- Tensorflow version (GPU?): no…
-
Thanks for your great work!
I am confused that I don't think the authors mention ema in their paper but I find it in your implementation. Could you explain why you use that? And is it OK for me not …
-
## Environment info
- `transformers` version: 4.15.0
- Platform: Linux-4.15.0-20-generic-x86_64-with-debian-buster-sid
- Python version: 3.7.11
- PyTorch version (GPU?): 1.10.0+cu113 (True)
-…
-
Hi,
Thanks for providing the code. I found that you are also using cosine annealing curve to decay weight decay. It seems that the paper does not mention this. Would you please tell me why do you u…
-
Based on [SO post](https://stackoverflow.com/q/70697470/17840900).
Goal: Amend [Bert-GLUE_OnnxRuntime_quantization.ipynb][1] to work with **Albert** and **Distilbert** models
Kernel: `conda_pyto…
-
I know it might be not proper to discuss here, but I'm still curious about the intuition behind the `tokenizer`.
-
## Environment info
- `transformers` version:
- Platform: MacOS Intel
- Python version: 3.7.10
- PyTorch version (GPU?): '1.8.1' (CPU)
- Tensorflow version (GPU?): '1.15.0' (CPU)
- Using GPU…