-
Hi,
I've been thinking some more about the attention masking and reading up on the flash attention itself.
If I understand correctly, a custom mask needs to be supplied to the attention to rep…
-
### Preflight Checklist
- [X] I could not find a solution in the existing issues, docs, nor discussions
- [X] I have joined the [ZITADEL chat](https://zitadel.com/chat)
### Describe your problem
Ri…
-
flash attention benchmark fails with [changes](https://github.com/intel/intel-xpu-backend-for-triton/pull/1905) to use upstream pytorch.
It should be a torch issue.
```
Traceback (most recent c…
-
Thanks for your contributions.
When I train model based on the setting: --max_seq_length 30 --max_seq_a_length 30 --max_img_seq_length 18, the error i get:
attention_scores= attention_sco…
-
Hi FlexAttention Team,
Thanks for your code.
I use flex attention to impl a fast io-aware streaming attention using this mask:
```python
def sliding_window_causal_with_stream(b, h, q_idx, kv…
-
https://github.com/ParadoxZW/LLaVA-UHD-Better/blob/main/llava_uhd/adapt_llava.py#L136-L138
这里由于The first token is for CLS,是不是需要把
```python
m[:w * h] = True
```
改成
```python
m[:w * h+1] = …
-
Hi,
Thank you for releasing your code. I would like to understand where is the decoupled cross-attention being used in the code, as stated in the paper. In the code, I only say concatenation. I wou…
-
After applying prov-gigapath slide-encoder, how can I access the attention distribution for each tile on the whole slide?
Just like attention-based MIL did.
Is there any instruction?
Thank you ve…
-
## 🐞Describing the bug
Hello. I'm trying to convert PyTorch model to Stateful CoreML Model
I wrote this code referred to [WWDC 2024 session Mistral-7B model](https://github.com/huggingface/swift-t…
-
Hello,
Thank you so much for your great work and codebase!
I would appreciate your clarifications on a few items.
1) From within ```TextToVideoSDPipelineCall.py```, at this [line](https://g…