-
Hi, I would like to ask why the attention mask is not used in the prefill stage.
I want to output the attention scores matrix in prefill stage. Is the code below right?
```
if spec: # s…
-
### 🐛 Describe the bug
When I use flex attention on one RTX 4090, I got some error.
A minimal repro:
```python
import torch
from torch.nn.attention.flex_attention import flex_attention
flex_at…
-
### Description
I am calling `jax.nn.dot_product_attention` with the following line:
```
dpsa_cudnn = jax.nn.dot_product_attention(query, key, value, implementation='cudnn')
```
However, this t…
-
使用deit_tiny_patch16_224训练出来的pth模型转成onnx模型再用pnnx和ncnn转换都报错
pnnx报错
./pnnx model.onnx inputshape=\[1,3,224,224\]
pnnxparam = model.pnnx.param
pnnxbin = model.pnnx.bin
pnnxpy = model_pnnx.py
pnn…
-
### 🚀 The feature, motivation and pitch
This would simplify code using `torch.nn.attention.sdpa_kernel` as the list of backends may evolve (e.g. Flex Attention might become a backend for SDPA?) and…
-
Hello, thank you for your work.
I have interest in the AttentiveFP implementation from the paper "Pushing the Boundaries of Molecular Representation for Drug Discovery with the Graph Attention Mech…
-
Hello,First of all, thank you very much for this research work! However, when I try to train with the 3DMatch dataset you provided, I get an error message:
(torch) yy@yy:~/se3-equi-graph-registration…
-
From [Algorithmic Simplicity](https://www.youtube.com/@algorithmicsimplicity):
- [x] [Why Does Diffusion Work Better than Auto-Regression? - YouTube](https://www.youtube.com/watch?v=zc5NTeJbk-k)
-…
-
PyTorch 2.0 has introduced `torch.compile` for accelerating training and inference. I have tried it on top of flash attention but unfortunately `torch` seems to unable to compile flash attention:
`…
-
**Describe the bug**
Doesnt seem to work on arm64
```
sullemanhossam@hossams-MacBook-Air ai_alignment_graph % npm i
npx quartz build --serve
up to date…