-
### 🚀 The feature, motivation and pitch
Thanks for fixing the soft-capping issue of the Gemma 2 models in the last release! I noticed there's still a [comment](https://github.com/vllm-project/vllm/bl…
-
In our paper we only showed results on causal language models, which use causally masked (decoder) self-attention.
If you'd like to use ALiBi for seq2seq tasks such as translation, speech or T5, o…
-
Hi there, Thank you for your wonderful work
I have a few questions I would like to ask:
1. Is it possible to handle the appearance of a new object in a video? Specifically, can we detect a new ob…
-
The feature section lacks interactivity and visual appeal, making it less engaging for visitors, which may hinder the website's ability to capture attention.
Description of the solution I'd like
…
-
Nice paper on making LLM fully attention-based. However, I noticed that the largest model discussed in the paper is a 1.5B model.
I wonder if the pattention layer is difficult to tensor parallelize…
-
Hey,
How can I get token-level contributions for the search query? This seems one of the strong benefits of ColBERT for highlighting relevant matches, but for some reason, I can't find any implemen…
-
Hey, thanks for the great work. I could be wrong, but I feel like there is a disconnect between what is mentioned in the Based paper and what is used in the Figure 2 config for MQAR eval. In the paper…
-
I think having flash attention in `equinox` should be a critical issue considering this is already natively built-in torch.
While XLA is supposed to (in theory) do some of the fusion, and possibly …
-
If document is open in a plank and stack attention jumps from the stack back to the standalone plank
https://github.com/user-attachments/assets/5f81904d-c0ea-46a3-89db-dd715143fd53
-
I am using [this pytorch provided script](https://github.com/pytorch/pytorch/blob/main/benchmarks/transformer/score_mod.py) to benchmark flex attention with eager and got the attached results ([defaul…