attention Search Results

1000+ results
for attention

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

bowang-lab/scGPT #47

attention mask during training

Hi, I've been thinking some more about the attention masking and reading up on the flash attention itself. If I understand correctly, a custom mask needs to be supplied to the attention to rep…

kzkedzierska updated 1 month ago
8
zitadel/zitadel #8549

Please make messages, especially the error messages, more at…

### Preflight Checklist - [X] I could not find a solution in the existing issues, docs, nor discussions - [X] I have joined the [ZITADEL chat](https://zitadel.com/chat) ### Describe your problem Ri…

wlinna updated 1 week ago
1
intel/intel-xpu-backend-for-triton #2042

[FA][Upstream PT] `XPU out of memory` raised by FA kernel wi…

flash attention benchmark fails with [changes](https://github.com/intel/intel-xpu-backend-for-triton/pull/1905) to use upstream pytorch. It should be a torch issue. ``` Traceback (most recent c…

ESI-SYD updated 1 week ago
2
microsoft/SwinBERT #52

attention_scores= attention_scores + attention_mask

Thanks for your contributions. When I train model based on the setting: --max_seq_length 30 --max_seq_a_length 30 --max_img_seq_length 18, the error i get: attention_scores= attention_sco…

Xiyu-AI updated 11 months ago
1
pytorch-labs/attention-gym #34

How to avoid re-compute mask

Hi FlexAttention Team, Thanks for your code. I use flex attention to impl a fast io-aware streaming attention using this mask: ```python def sliding_window_causal_with_stream(b, h, q_idx, kv…

NonvolatileMemory updated 1 week ago
6
ParadoxZW/LLaVA-UHD-Better #3

Attention mask的计算？

https://github.com/ParadoxZW/LLaVA-UHD-Better/blob/main/llava_uhd/adapt_llava.py#L136-L138 这里由于The first token is for CLS，是不是需要把 ```python m[:w * h] = True ``` 改成 ```python m[:w * h+1] = …

hust-nj updated 1 month ago
15
tencent-ailab/IP-Adapter #378

Cross Attention in training and generation

Hi, Thank you for releasing your code. I would like to understand where is the decoupled cross-attention being used in the code, as stated in the paper. In the code, I only say concatenation. I wou…

athena913 updated 1 week ago
4
prov-gigapath/prov-gigapath #68

Access the attention map on slide-level

After applying prov-gigapath slide-encoder, how can I access the attention distribution for each tile on the whole slide? Just like attention-based MIL did. Is there any instruction? Thank you ve…

superhy updated 1 month ago
1
apple/coremltools #2325

Failed to build the model execution plan using a model archi…

## 🐞Describing the bug Hello. I'm trying to convert PyTorch model to Stateful CoreML Model I wrote this code referred to [WWDC 2024 session Mistral-7B model](https://github.com/huggingface/swift-t…

Skyline-23 updated 2 weeks ago
2
hohonu-vicml/TrailBlazer #8

Cross Attention maps

Hello, Thank you so much for your great work and codebase! I would appreciate your clarifications on a few items. 1) From within ```TextToVideoSDPipelineCall.py```, at this [line](https://g…

danielajisafe updated 3 months ago
1

上一页 1...16 17 18 19 20 21 22...100 下一页

1000+ results for attention

1000+ results
for attention