attention Search Results

1000+ results
for attention

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

ofirpress/attention_with_linear_biases #5

Modifying ALiBi for Encoder-Attention or Cross-Attention

In our paper we only showed results on causal language models, which use causally masked (decoder) self-attention. If you'd like to use ALiBi for seq2seq tasks such as translation, speech or T5, o…

ofirpress updated 4 months ago
29
axolotl-ai-cloud/axolotl #1918

Different training losses when flash_attention is on/off

### Please check that this issue hasn't been reported before. - [X] I searched previous [Bug Reports](https://github.com/axolotl-ai-cloud/axolotl/labels/bug) didn't find any similar reports. ###…

zhangchen-xu updated 1 month ago
3
vllm-project/vllm #7442

[Feature]: sliding window attention for odd layers

### 🚀 The feature, motivation and pitch Thanks for fixing the soft-capping issue of the Gemma 2 models in the last release! I noticed there's still a [comment](https://github.com/vllm-project/vllm/bl…

tanliboy updated 3 weeks ago
1
unslothai/unsloth #1038

How to use Custom Trainer from hugginface instead of SFTtrai…

I want this trainer class to be implemented with unsloth. How can i do that. ```class CustomTrainier(Trainer): def __init__(self, model, args, train_dataset, eval_dataset, tokenizer, **kwargs)…

ankitprezent updated 1 month ago
1
patrick-kidger/equinox #732

Flash Attention

I think having flash attention in `equinox` should be a critical issue considering this is already natively built-in torch. While XLA is supposed to (in theory) do some of the fusion, and possibly …

neel04 updated 5 months ago
6
AnswerDotAI/RAGatouille #170

Get token level attention

Hey, How can I get token-level contributions for the search query? This seems one of the strong benefits of ColBERT for highlighting relevant matches, but for some reason, I can't find any implemen…

TeemuSo updated 2 months ago
1
zjunlp/EasyEdit #418

Question about /hparams/MEMIT/llama3.2-3b.yaml

经打印llama-3.2-3b模型参数： ``` Model Parameters and their Shapes: model.embed_tokens.weight: torch.Size([128256, 3072]) model.layers.0.self_attn.q_proj.weight: torch.Size([3072, 3072]) model.layers.0.s…

Darknessrky updated 13 hours ago
3
dxos/dxos #7264

[composer] attention jumps from stack

If document is open in a plank and stack attention jumps from the stack back to the standalone plank https://github.com/user-attachments/assets/5f81904d-c0ea-46a3-89db-dd715143fd53

wittjosiah updated 1 month ago
1
tenstorrent/tt-metal #13364

Enable FP32 Accumulate in Flash Attention and Flash Decode

### Description We do not have support for fp32 accumulate in sdpa family kernels. This becomes a problem when number of chunks gets large and we see diverging pcc from ground truth. For models that …

caixunshiren updated 3 weeks ago
1
HazyResearch/zoology #29

Question about MQAR eval of Based

Hey, thanks for the great work. I could be wrong, but I feel like there is a disconnect between what is mentioned in the Based paper and what is used in the Figure 2 config for MQAR eval. In the paper…

Hprairie updated 1 week ago
3

上一页 1...32 33 34 35 36 37 38...100 下一页

1000+ results for attention

1000+ results
for attention