-
Hi,
thank you for releasing the code about nonautoregressive_transformer.
but why I can‘t find the positional attention in decoder as described in the paper [Non-Autoregressive Neural Machine Tr…
-
The Deepspeed implemented a transformer kernel that invokes the CUDA kernel only once for Q, K and V values, as opposed to three times (one invocation for Q, K and V respectively), resulting in 3% to …
-
Sorry for a noob question, what is the best (in terms of quality) english TTS available pretrained for today? Is it a following combination or there is something better?
1. Tacotron2 | char_train_n…
qo4on updated
3 years ago
-
Hi Giannis!
Thanks for the great paper! I am interested in your asymmetric LSH, as I think having separate query / key space (as opposed to shared QK as in Reformer) will bring performance improvem…
-
## Environment info
- `transformers` version: 4.3.3
- The other parameters are irrelevant
### Who can help
@patrickvonplaten @sgugger
## Information
I apologize for not using the prov…
-
**System information**
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04):Window 7
- TensorFlow installed from (source or binary):binary
- TensorFlow version (or github SHA if from source):2…
zrct0 updated
3 years ago
-
I want to implement some changes to the self-attention used in the Transformer for MT, namely implement locality-sensitive hashing (https://arxiv.org/pdf/2001.04451.pdf).
Right now, self-attention …
-
Hi , thanks for making such repo.
I have one question here:
Why do you mark "HOT-Net: Non-Autoregressive Transformer for 3D Hand-Object Pose Estimation. " as MM20 paper. I could not find the citatio…
-
As the 2010’s draw to a close, it’s worth taking a look back at the monumental progress that has been made in Deep Learning in this decade.
Tags: deep_learning
via Pocket https://ift.tt/…
-
Hi, Jungo. Thanks for your nice code!
I wanna use your disco model to train an autoregressive model as you said in your paper (sec 5.1 : AT with Contextless KVs). I saw there is one args called at-…