-
My device is a 4090 in hopper architecture, consistent with the h100 architecture. But on the homepage it says “Requirements: H100 / H800 GPU, CUDA >= 12.3.”
I would like to know if flash attentio…
-
### Model description
"Attention Is All You Need" is a landmark 2017 research paper authored by eight scientists working at Google, responsible for expanding 2014 attention mechanisms proposed by Bah…
-
I would like to build a generative AI that is more advanced than the sonnet or OpenAI O1 models. I would like to use advanced mechanisms from OpenAI, Anthropic, and other sources to build the most adv…
-
## Overview
The focus for this code review will be centered around the AuditedBalanceInput page and BudgePlInput page.
Please pay attention too:
* Javascript issues
* React components
#…
-
-
Many modern architectures use either GQA or MQA rather than MHA, but `dot_product_attention` allows only MHA by enforcing `query`, `key` and `value` should have the same number of heads:
https://gi…
-
### 🐛 Describe the bug
When using `torch.compile` on `torch.nn.functional.scaled_dot_product_attention` with length 1, a RuntimeError occurs during the backward pass:
```python
import torch
from…
-
Hey @sunovivid, great work, and congrats on the paper's acceptance at ECCV!
I would like to reproduce the results, and I have the following questions related to the hyperparameters:
1. **How man…
-
### Type of issue
Other (describe below)
### Description
_This issue has been moved from [a ticket on Developer Community](https://developercommunity.visualstudio.com/t/Feedback-on-the-Common-web…
-
### Motivation.
As vllm supports more and more models and functions, they require different attention, scheduler, executor, and input output processor. . These modules are becoming increasingly com…