-
[paper](https://arxiv.org/pdf/2104.09864.pdf), [code](https://github.com/huggingface/transformers/blob/v4.28.1/src/transformers/models/roformer/modeling_roformer.py#L318-L343)
## TL;DR
- **I r…
-
## 🐛 Bug
I trying to run linformer model with DistributedDataParallel from [this repo](https://github.com/tatp22/linformer-pytorch)
## To Reproduce
run this script
```python
import os
im…
-
Do you think that any of that implementations would be compatible with Transformer-XL? Thanks!
-
Hello,
in my Master Thesis, I was aiming to use BERT for Topic modeling / Document Clustering. As a dataset, I'm using a big corpus of over 100k news articles from a german newspaper(short headline…
-
https://arxiv.org/abs/2009.06732
-
-
## 🐛 Bug
For #29889, I ran into the exact same problem even though I used python3.6. I could train for some epochs, and this error suddenly comes out.
The error occurred when I used DDP to perform…
-
### **Initial action plans**
Copying these things from the wav2vec2 repo for safe housekeeping.
* An immediate quantize could be to convert the fine-tuned model using TFLite APIs. [Post-trainin…
-
find NaN for training faster_rcnn_r101_fpn_1x_coco.py
Package Version Editable project location
---------------------------- -------------------- ------------…
-
- https://arxiv.org/abs/2108.09084
- 2021
Transformerは、テキスト理解のための強力なモデルです。
しかし、Transformerは入力配列の長さに対して二次的な複雑さを持つため、非効率的です。
Transformerの高速化に関する多くの手法があるが、それらはまだ長い配列では非効率的であるか、あるいは十分に効果的ではない。
本論文で…
e4exp updated
2 years ago