-
**Describe the bug**
When attempting to shard a `gemma_2b_en` model across two (consumer-grade) GPUs, I get:
```
ValueError: One of device_put args was given the sharding of NamedSharding(mesh=…
-
Hi! Thank you for your great work. I was looking at the code and I see that deformable attention is only used in the cross-attention Decoder module.
Why is deformable attention not used anywhere e…
-
When im trying to use Videocrafter 2 - i get this error :
F:\Pinokio\api\videocrafter2.git\app\env\lib\site-packages\torch\nn\functional.py:5560: UserWarning: 1Torch was not compiled with flash att…
-
Hello, I saw a paragraph in the paper that simply stated that the attention operation was omitted in the encoder module, so the encoder only consists of FFN layers.
Here, we omit the attention mech…
-
I was doing some editing of a jbeam, then started a new project with the f4 menu, and this remained. Not a big issue, it seems that doing basically anything clears it, but I thought I'd bring it to yo…
-
Hello, when I was building attention heatmaps, I found that the attention scores across different patches did not vary much. Have you encountered this problem before?
-
### 🐛 Describe the bug
According to https://github.com/pytorch/pytorch/actions/workflows/slow.yml?query=is%3Asuccess
last successful run of the workflow on main branch was on Aug 20th for https://gi…
-
Hi, thank you for your wonderful work! I notice that in the training process, the loss function is composed of the attention loss and the cam_up_similarity. Since the cam_up_similarity was not discuss…
-
When scanning a QR code to get a badge (while signed into FAS) i got this success notification that the badge was awarded
![1000002207](https://github.com/user-attachments/assets/f1d09e2b-4f33-4bba…
-
### 🚀 The feature, motivation and pitch
FlexAttention was proposed as a performant attention implementation leveraging `torch.compile` with easy APIs for adding support for complex attention varian…
mgoin updated
1 month ago