efficient-attention Search Results

1000+ results
for efficient-attention

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

5g4s/paper #28

Efficient Attention: Attention with Linear Complexities

https://arxiv.org/abs/1812.01243

5g4s updated 1 year ago
2
facebookresearch/xformers #960

[Guide] How to finish xFormers compile with source code, up…

I have seen many users in the community encounter compilation and build problems, and many failures, especially when using new versions of CUDA, PyTorch, or wanting to update CUTLASS or Flash Attentio…

soulteary updated 3 days ago
1
shenyunhang/APE #51

infer issue: 'NoneType' object has no attribute 'memory_effi…

Namespace(confidence_threshold=0.2, config_file='configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO_GQA_PhraseCut_Flickr30k/ape_deta/ape_deta_vitl_eva02_clip_vlf_lsj1024_cp_16x4_1080k.py', input=['…

wdc233 updated 2 months ago
6
flashinfer-ai/flashinfer #237

Support MLA (Multi-Head Latent Attention) in DeepSeek-v2

MLA(Multi-Head Latency Attention) was proposed in [DeepSeek-v2](https://github.com/deepseek-ai/DeepSeek-V2/blob/main/deepseek-v2-tech-report.pdf) for efficient inference.

yzh119 updated 4 days ago
4
facebookresearch/xformers #406

Torch JIT breaks when memory_efficient_attention

# 🐛 Bug torch.jit.trace breaks with the following error: `RuntimeError: unsupported output type: int, from operator: xformers::efficient_attention_forward_generic` The output of the ops conta…

Dango233 updated 4 months ago
13
pytorch/pytorch #129523

`test_dummy_mha_with_nt_cuda` fails on `sm70`, `sm75`

### 🐛 Describe the bug Looks like it's dispatching to efficient attention backward and failing one of the shape checks ( ``` TORCH_CHECK( max_seqlen_k

eqy updated 1 month ago
4
tma15/paper-reading-list #22

Hydra Attention: Efficient Attention with Many Heads

- [pdf](https://arxiv.org/pdf/2209.07484.pdf)

tma15 updated 1 year ago
2
facebookresearch/xformers #742

`memory_efficient_attention` runs in f32 with `autocast`

Hello, I am using `torch.cuda.amp.autocast` with `bfloat16`. I noticed that the xformers `RotaryEmbedding` produces `float32` outputs, which then requires casting before passing to `memory_efficien…

mitchellnw updated 3 months ago
4
ROCm/pytorch #1390

Add support for memory efficient attention for AMD/ROCm

### 🚀 The feature, motivation and pitch Enable support for Flash Attention Memory Efficient and SDPA kernels for AMD GPUs. At present using these gives below warning with latest nightlies (torch==…

Looong01 updated 4 months ago
1
chengchingwen/Transformers.jl #79

Feature Request: Memory-Efficient Attention

From https://arxiv.org/abs/2112.05682v2. I have no immediate use for this, but it looks cool and I didn't want it to go unmentioned in case some aspiring contributor to Transformers.jl is looking for …

ToucheSir updated 2 years ago
2

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for efficient-attention

1000+ results
for efficient-attention