local-attention Search Results

vllm-project/vllm #9464

[Feature]: Alternating local-global attention layers

### 🚀 The feature, motivation and pitch Gemma-2 and new Ministral models use alternating sliding window and full attention layers to reduce the size of the KV cache. The KV cache is a huge inferen…

griff4692 updated 1 week ago

c-yn/IRNeXt #11

About Local Attention Module(LAM) and the code

Hi Bro,it's me again. I read your paper again and plan to share your idea, but a little confuse. I find that LAM don't match the code.The structure of the LAM module in the paper show the low-freq…

Urian-oy updated 1 week ago

66RING/tiny-flash-attention #9

is the cutlass version support on sm75

* The terminal process "/bin/bash '-c', '/usr/local/cuda-12.4/bin/nvcc -g -G -diag-suppress=177 -lineinfo --std=c++17 -arch=sm_75 '-D CUTE_ARCH_LDSM_SM75_ACTIVATED' -o flash_attention_cutlass_standa…

A-transformer updated 1 week ago

pytorch-labs/attention-gym #60

How to do KV Cache with FlexAttention and BlockMask by slici…

Is there any example code to do this? Should I generate new BlockMask everytime? Thanks! ------------------------------ Essentially, I have problem of slicing BlockMask. For exmaple, if we have…

Leo-T-Zang updated 23 hours ago

OpenRLHF/OpenRLHF #468

Context Parallel Failded for Modified SFT Trainer

Update 2024/10/21 Hi, after debugging, I find `rank: 7, local_label shape: torch.Size([1, 3086]), locak_label max: 128009, locak_label min: -100, logits_shape: torch.Size([1, 3086, 128256])` In SF…

ZetangForward updated 3 days ago

NVIDIA/TensorRT-LLM #2278

Building INT8 Engine for hugging face models

### System Info TensorRT Model Optimizer: 0.15.1 TensortRT-LLM version: 0.14.0.dev2024100100 Python version OS: Ubuntu 22.04 CPU Arch: x86_63 Driver version: 555.42.02 CUDA Version:12.5 ### Who can…

prawin-srini updated 1 week ago

Dao-AILab/flash-attention #947

Three-dimensional local attention

As of right now FlashAttention only supports one-dimensional local attention. I intend to implement up to three-dimensional local attention where the effective attention mask would be a rectangular cu…

JohannesGaessler updated 5 months ago

jax-ml/jax #23989

Mosaic failed to compile TPU kernel

### Description I am trying to fine-tune Gemma 2 on TPU and got the following error: ``` Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/jax/_src/compiler.py", l…

jhkchan updated 3 weeks ago

WindVChen/Diff-Harmonization #9

I am getting KeyError: 'up_cross'

please help me solving this issue Optimize_text_embed: 0% 0/49 [00:00

Rajesh1215 updated 1 day ago

zhiwei-liang/MAXFormer #1

local attention and global attention

Hi, first of all, thank you for your work. I have a question: global_x = Rearrange('b d (x w1) (y w2) -> b x y w1 w2 d', w1=w, w2=w)(x) global_x = self.grid_attn(global_x) global_…

AmariJane updated 11 months ago

1000+ results for local-attention

1000+ results
for local-attention