-
### Search before asking
- [X] I have searched the Ultralytics YOLO [issues](https://github.com/ultralytics/ultralytics/issues) and [discussions](https://github.com/ultralytics/ultralytics/discussion…
Umsh updated
2 months ago
-
Is any plan to add attention masking support? PyTorch's version of flash attention v1 included the ability to provide an attention mask in their [implementation](https://pytorch.org/docs/stable/genera…
-
### 🐛 Describe the bug
Encountered while testing for preview release/2.6 builds as part of https://github.com/pytorch/pytorch/issues/139175
```python test_flex_attention.py -k "test_load_from_bias…
-
python main.py --function test --config configs/cub_stage2.yml --opt "{'test': {'load_token_path': 'ckpts/cub983/tokens/', 'load_unet_path': 'ckpts/cub983/unet/', 'save_log_path': 'ckpts/cub983/log.tx…
-
Description: When running inference on the distilbert-base-uncased model using the NPU on Snapdragon® X Elite (X1E78100 - Qualcomm®) through ONNX Runtime's QNNExecutionProvider, the model fails to inf…
-
I install the flash-attention and follow this link: https://rocm.blogs.amd.com/artificial-intelligence/flash-attention/README.html
my GPU is gtx1100(7900XTX)
I install it in the docker and the d…
-
Hello, when I was building attention heatmaps, I found that the attention scores across different patches did not vary much. Have you encountered this problem before?
-
The following edits were required to make llama3 8b fp16 work:
```
config["attn_head_count"] = 8 # 8 instead of 32
config["paged_kv_cache"] = {}
config["paged_kv_cache"]["block_seq_stride"] = conf…
-
Passing the --use-flash-attn flag is intended to enable flash attention; however, when the --use-mcore-models flag (to use the transformer engine) is also specified, flash attention will not be applie…
-
Hi, so in the Causal Cross Attention, I see we are registering a causal mask being the lower triangular matrix.
However when we are trying to learn the latent parameters C of seq_len m such that m < …