recurrent-attention-model Search Results

942 results
for recurrent-attention-model

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

ggerganov/llama.cpp #846

llama : add RWKV models support

RWKV (100% RNN) language model, which is the only RNN (as of now) that can match transformers in quality and scaling, while being faster and saves memory. Info: https://github.com/BlinkDL/ChatRWKV …

multimediaconverter updated 1 month ago
36
ybayle/awesome-deep-learning-music #5

Missing information

In [dl4m.bib](https://github.com/ybayle/awesome-deep-learning-music/blob/master/dl4m.bib): - [ ] 2 missing pdf: [Bharucha1988](https://github.com/ybayle/awesome-deep-learning-music/blob/master/dl4m.b…

ybayle updated 5 years ago
3
ibab/tensorflow-wavenet #112

Global condition and Local conditioning

In the white paper, they mention conditioning to a particular speaker as an input they condition globally, and the TTS component as an up-sampled (deconvolution) conditioned locally. For the latter, t…

thomasmurphycodes updated 5 years ago
68
uchicago-computation-workshop/Fall2023 #2

09/28/2023 - James Evans

We ask that you: - Post the questions and comments you have about the suggested readings. - Upvote (“thumbs up”) at least 5 questions from other people. Upvote questions and recommendations you li…

bhavyapan updated 1 year ago
125
intel-analytics/ipex-llm #10073

Qwen issues with >2K input token size on MTL

Qwen model FP32-INT4 precsion inference, input token size 2500 there are 2 issues found 1) Output tokens repeats 2) Reports Native API failed when running the same command in 2nd round Platform…

juan-OY updated 7 months ago
15
LeapLabTHU/MLLA #2

Single-head or multi-head?

Thank you for your work, which has inspired me greatly. In your paper, you mention that Mamba is a single-head model (Equation 12). This seems to differ from my understanding. You also state that …

patronum08 updated 4 months ago
6
huggingface/transformers #30815

Enabling timestamps changes text/reduces accuracy

### System Info - `transformers` version: 4.40.2 - Platform: Linux-6.1.0-20-amd64-x86_64-with-glibc2.36 - Python version: 3.11.2 - Huggingface_hub version: 0.21.4 - Safetensors version: 0.4.2 - …

jaggzh updated 2 months ago
7
proroklab/popgym #35

Denominator in Fast Autoregressive Transformer

Hi! Great work on a very interesting topic! The [original fast autoregressive transformer paper](https://arxiv.org/pdf/2006.16236) includes the following formula for the output in the recurrent for…

maciejwolczyk updated 3 months ago
4
google-deepmind/recurrentgemma #8

[Bug] ValueError: Truncated Zstd-compressed stream error whe…

## 🐛 Bug Description When running the `fine_tuning_tutorial_jax.ipynb` notebook on a CPU in Google Colab, I encountered the following error: ``` --------------------------------------------------…

Sunwood-ai-labs updated 3 months ago
2
josephjaspers/blackcat_tensors #53

Neural Network: Save/Load should preserve intermediate value…

Currently when when saving a model only the weights are preserved, however the state should be preserved as well.

josephjaspers updated 4 years ago
44

上一页 1...25 26 27 28 29 30 31...95 下一页

942 results for recurrent-attention-model

942 results
for recurrent-attention-model