-
https://github.com/xjdr-alt/entropix/blob/eaaddb27f344c8c28922c7bfd0e01006645729ae/entropix/torch_sampler.py#L56-L58
This calculation computes entropy over the attention scores for each position in…
-
When I use meta-llama/Llama-3.2-1B
Can it be fixed?
```
RuntimeError: Error(s) in loading state_dict for Transformer:
Missing key(s) in state_dict: "tok_embeddings.weight", "layers.0.attention.wq…
-
### 🐛 Describe the bug
Flex attention with dynamic shapes stumbles upon comparing Relational expressions. I found two places of this error.
One in `flex_decoding.py`:
```
File "/usr/local/li…
-
'roberta.encoder.layer.22.crossattention.self.abs_bias.1', 'roberta.encoder.layer.8.attention.self.abs_bias.0', 'roberta.encoder.layer.9.attention.self.abs_bias.1', 'roberta.encoder.layer.5.attention.…
-
### 🐛 Describe the bug
I tried to implement the `causal_lower_right` masking in flex attention. This requires the masking function to know the difference in lengths of keys and queries:
```python
…
-
### 🚀 The feature, motivation and pitch
I am workning on 4D attention mask input and LLM generateion process. Huggingface provides an interface for the 4D attention mask. Does vllm have any plan? htt…
-
### Search before asking
- [X] I have searched the Ultralytics YOLO [issues](https://github.com/ultralytics/ultralytics/issues) and [discussions](https://github.com/ultralytics/ultralytics/discussion…
-
### Problem Description
I get these errors often from [various applications](https://github.com/pytorch/pytorch/issues/134208), this one if from ComfyUI.
Is scaled_dot_product_attention part of fl…
-
# ❓ Questions and Help
I am new to xformers, and I want to speed my Transformer models w/ it. But I found that `xformers` is no speed up compared w/ `scaled_dot_product_attention` from PyTorch. Here …
-
The following edits were required to make llama3 8b fp16 work:
```
config["attn_head_count"] = 8 # 8 instead of 32
config["paged_kv_cache"] = {}
config["paged_kv_cache"]["block_seq_stride"] = conf…