gqa Search Results - Githubissues

1000+ results
for gqa

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

pytorch/pytorch #139586

[CuDNN Attention] Performance Grouped Query Attention

# Summary We recently landed support for grouped query attention via use `enable_gqa` on sdpa, however this is only enabled on the flash attention backend. This leads to a weird situation where it c…

drisspg updated 3 days ago
4
flashinfer-ai/flashinfer #595

[Question] very small performance gain for cascade append on…

For Llama3-70B TP8, we have 8 q-heads and 1 k-head. Assuming we have 4000 shared prefix tokens with batch 8, cascade decoding is much slower than baseline (26us vs 19us). But if we set k-heads to 8…

hewr1993 updated 23 hours ago
1
horseee/LLM-Pruner #64

Adaptation of GQA

Thank you for your solid work. I would like to ask if the current version is suitable for GQA architecture models, such as LLaMA-2-70B and LLaMA-3.

junzhang-zj updated 2 months ago
7
intel/intel-extension-for-pytorch #696

torch.xpu for GQA

### Describe the issue Hi I come from https://github.com/vllm-project/vllm/issues/6701. I am wondering when will the 2.3.110 IPEX be released.

liuxingbin updated 1 month ago
6
ROCm/AMDMIGraphX #3596

GroupQueryAttention produces incorrect results when loaded f…

This issue occurs in the llama2 fp16 and int4 weights models, as well as a trimmed model that returns after the first GQA node.

turneram updated 3 days ago
2
JeonJaeHyeong/DPL #7

The problem of missing documents

When I try to train the GQA_200 dataset using the following command, the error prompts AttributeError: module ‘pysgg.data.datasets’ has no attribute ‘GQADataset’, and I can't find any file about GQADa…

G200005 updated 13 hours ago
1
OpenGVLab/InternVL #576

[Docs] Can't evaluate GQA testdev.

### 📚 The doc issue I don't think it's possible to get the structure of the dataset as depicted below in the diagram as shown in the diagram. ### Suggest a potential alternative/fix I don't k…

Zzzzzzzzzzj updated 1 month ago
1
mit-han-lab/Quest #8

Support for bsz>1 and GQA

Great job! We found that Quest is implemented on the previous version of flashinfer and some common feature are not support currently. * bsz > 1 * GQA * CUDA graph Is there any plan to update t…

Ryanuppp updated 2 weeks ago
3
pytorch/pytorch #139352

[ROCm] sdpa group query attention bf16 numeric error

### 🐛 Describe the bug Hi AMD Team, On MI300X pytorch nightly grouped query attention is running into numeric errors. I have confirmed on H100 that this script does not have numeric errors. C…

OrenLeung updated 2 days ago
12
fkodom/grouped-query-attention-pytorch #4

gqa model runtime > no gqa model runtime

Hi!, I'm trying to replicate your implementation with Llama 2-13B and 7B, but curiously the runtimes didn't make sense (llama 2 gqa > llama 2 WITHOUT gqa) there is a little difference between my code …

Adonai02 updated 5 months ago
1

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for gqa

1000+ results
for gqa