-
Great work. Thanks for sharing.
In the paper it is said that the shape of the adjacancy matrix is (n+1)*(n+1), which sould be 22,22 (21 for joints and 1 for food contact). However, in your implementa…
-
Hi,
i encounter the following error message trying to enable flash attention when running the command below. Can i know is flash attention supported ?
``command: ./main -m $model -n 128 --prompt …
-
Add attention head visualization in evaluation pipeline
-
Is there any way the Flash Attention 2 support for this model? if there is a way to do it i would love to get involved and help out!
I've tried implement by looking at [MusicGen's one ](https://git…
-
https://github.com/ParadoxZW/LLaVA-UHD-Better/blob/main/llava_uhd/adapt_llava.py#L136-L138
这里由于The first token is for CLS,是不是需要把
```python
m[:w * h] = True
```
改成
```python
m[:w * h+1] = …
-
### System Info
L4 GPU (AWS G6.12xl) with TensorRTLLM 0.11.0, running with Tritonbackends
### Who can help?
_No response_
### Information
- [ ] The official example scripts
- [ ] My own modified …
-
Maybe it is to niche, but we would be interested in a fused circular windowed attention.
Currently, we pad q,k,v, use the fused kernel and crop.
It would either help if k,v could have different di…
-
Hey, @Luke-Luo1
Thank you for your great work! I noticed that the DS-attention used in dance decoder only enhances the attention score of the upper body joints. I just wonder why not do the DS-att…
-
### Description of the bug
Not really a bug. But the scrolling in the layer view used to be a lot better.
I am not sure if this issue is particular to diferent mice models, but it used to be that fo…
-
Hi, thank you for an interesting work :)
I was wondering how the "attention heatmap" in the paper was drawn.
If I have understood your method correctly, the learnable parameters are only added to …