-
RWKV (100% RNN) language model, which is the only RNN (as of now) that can match transformers in quality and scaling, while being faster and saves memory.
Info: https://github.com/BlinkDL/ChatRWKV
…
-
In [dl4m.bib](https://github.com/ybayle/awesome-deep-learning-music/blob/master/dl4m.bib):
- [ ] 2 missing pdf: [Bharucha1988](https://github.com/ybayle/awesome-deep-learning-music/blob/master/dl4m.b…
-
In the white paper, they mention conditioning to a particular speaker as an input they condition globally, and the TTS component as an up-sampled (deconvolution) conditioned locally. For the latter, t…
-
We ask that you:
- Post the questions and comments you have about the suggested readings.
- Upvote (“thumbs up”) at least 5 questions from other people. Upvote questions and recommendations you li…
-
Qwen model FP32-INT4 precsion inference, input token size 2500
there are 2 issues found
1) Output tokens repeats
2) Reports Native API failed when running the same command in 2nd round
Platform…
-
Thank you for your work, which has inspired me greatly.
In your paper, you mention that Mamba is a single-head model (Equation 12). This seems to differ from my understanding. You also state that …
-
### System Info
- `transformers` version: 4.40.2
- Platform: Linux-6.1.0-20-amd64-x86_64-with-glibc2.36
- Python version: 3.11.2
- Huggingface_hub version: 0.21.4
- Safetensors version: 0.4.2
- …
-
Hi! Great work on a very interesting topic!
The [original fast autoregressive transformer paper](https://arxiv.org/pdf/2006.16236) includes the following formula for the output in the recurrent for…
-
## 🐛 Bug Description
When running the `fine_tuning_tutorial_jax.ipynb` notebook on a CPU in Google Colab, I encountered the following error:
```
--------------------------------------------------…
-
Currently when when saving a model only the weights are preserved, however the state should be preserved as well.