-
I tried different values (`--attn-layers [1,2,3] `) for attention mechanism, but the results are either the same or worse. Did anyone find a way to improve FID/IS scores using attention?
-
[arxiv](https://arxiv.org/abs/1705.08091)
[keithito/tacotron/issues/#72](https://github.com/keithito/tacotron/issues/72)で勧められた
Tacotron の学習に、複数話者の音声データを使用した際に
Enc/Doc の Alignment がうまくいかない問題が起…
-
Hi @LinB203 , just want to bring [VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis](https://arxiv.org/pdf/2403.13501.pdf) to your attention, where the temporal attention mechanism…
-
```
ass_mask=torch.ones(q_size2*q_size1,1,1,q_size0).cuda() #[31*128,1,1,11]
x, self.attn_asset = attention(ass_query, ass_key, ass_value, mask=None,
…
-
## Description
For implementing a pointer mechanism in sequence to sequence models it is very practical to re-use attention cells. For example see the Attention-Based Copy Mechanism described in Jia,…
-
Hi, your developed AttentionDTA is very useful for the Interpretability of DTA prediction task!
but I didn't find the corresponding Attention module from your published project
(eg. **_how to calcul…
-
I'm finding that training a 1-expert dMoE (brown) has worse training loss than an otherwise equivalent dense model (green). Is there some reason why this difference is expected or can I expect them to…
-
This is a little bit of a plug, so I'll keep it short! I'm trying to nail down _**exactly** what's going on here_.
https://riprompt.com
https://riprompt.com/riprompt.txt
https://chatgpt.com/g/g-9…
-
Initially brought up attention to this by https://neurostars.org/t/cloning-hbn-cpac-data/30587/4 .
I cloned this dataset locally to investigate and found that it consumes 1.3G of `.git/objects` and…
-
We know that flash attention supports `cu_seqlens`, which can remove padding for variable-length input in a batch and only store regular tokens. This can be useful for optimizing the computational eff…