-
There are a couple different libraries investigated, I am choosing
- [x] [BertViz](https://github.com/jessevig/bertviz/tree/master): shows the attentions between `query` from search and a SERP
Oth…
-
### 🐛 Describe the bug
With a 2D spatial neighborhood pattern, flash attention is orders of magnitude slower than dense attention:
hlc=2
seq_length : 192
flex attention : 0.0015106382369995117 […
-
While using the right hidden size for the rotation,
Llama 3 8B model perform better
WIKITEXT2 PPL: 11.544 > WIKITEXT2 PPL: 8.967
But, for the other models, while running the fake quant:
`pyt…
-
### Describe the issue
I want to ask a general question. When analyzin attention score, I feel that my attention score is quite sparse and their values are also very low. I cannot obtain any valuable…
-
Last thing in the old repo was to redirect issue here so not sure if this is duplicate of something there but I loved teh extension in a1111 and hope to have it again with forge...
This is what I g…
-
### What happened?
I would like to begin by expressing my sincere gratitude to the authors for their dedication and effort in developing this work.
To provide context for the issue I am encounter…
-
### Feature request
I am working with flash linear attention (FLA) model and aiming to process data in a frame-by-frame manner with [batch, frame, feature] input format. Additionally, I am looking …
-
For reproduction.
Input Model:
https://sharkpublic.blob.core.windows.net/sharkpublic/sai/sdxl-punet/punet.mlir
Input data :
wget https://sharkpublic.blob.core.windows.net/sharkpublic/sai/sdx…
-
### Have I written custom code (as opposed to using a stock example script provided in MediaPipe)
Yes
### OS Platform and Distribution
Android 14
### Mobile device if the issue happens on …
-
Hello lyuwenyu,
First of all, thank you for your amazing work on RT-DETR! I’ve just started learning about object detection models, and I truly appreciate the innovations that make RT-DETR both fas…