-
Hello,
We have encountered an issue with the Authlib library's dependency management, specifically related to the cryptography package. The current setup.py includes an "unpinned" version specifica…
-
### 🐛 Describe the bug
Passing head_dim > 128 on Ampere arch fails with errors from cudnn frontend.
All necessary imports at the beginning
```
import torch
b, h = 8, 2
s_q, s_kv = 128, 128
…
-
How do I create the attention heat map based from Attention model like the image below?
PS: I can access the graph through tfdebug but I don't know what I am searching for
![phtqi](https://user-…
-
### What happened?
```10:32 akara@nneka /Users/akara/currentporoject/llama.cpp
% sw_vers
ProductName: macOS
ProductVersion: 14.6
BuildVersion: 23G80
10:43 akara@nneka /Users/akara/current…
-
## 🐛 Bug
The issue looks related to **lifted constants** during `torch.export`, I found a commit https://github.com/pytorch/xla/commit/d8d7e58b78664aff2713e5f25adb3d61c42d44e7 might be related, but…
-
### 🐛 Describe the bug
Hello, I'm using torch.compile on flexAttn with sliding window. When I run my model in 16-mixed or 16bit float precision the attention function segfaults during the initial b…
-
Hello, I read your paper and I think it is a very good `GNN interpretable` work. I think it might inspire me, so I would like to study the details of your code implementation, unfortunately I didn't f…
-
## Arxiv/Blog/Paper Link
https://arxiv.org/abs/2405.07395
## Detailed Description
## Context
The factorized attention is quadratic in axial attention rather than the full size, which shoul…
-
Platforms: linux
This test was disabled because it is failing in CI. See [recent examples](https://hud.pytorch.org/flakytest?name=test_scaled_dot_product_attention_cuda_dynamic_shapes_cuda_wrapper&su…
-
## Unable to freeze tensor of type Int64/Float64 into constant layer, try to compile model with truncate_long_and_double enabled
When I try to test the Transformer Attention layer with tensorRT, I g…