fast-attention Search Results

fishaudio/fish-speech #671

while training full or warnings and errors, the weights are …

### Self Checks - [X] This template is only for bug reports. For questions, please visit [Discussions](https://github.com/fishaudio/fish-speech/discussions). - [X] I have thoroughly reviewed the proj…

padmanabanSampath updated 4 days ago

lyogavin/airllm #192

it is run

!pip install -U airllm !pip install -U bitsandbytes !pip install git+https://github.com/huggingface/transformers.git !pip install git+https://github.com/huggingface/ac…

werruww updated 2 weeks ago

Vaibhavs10/insanely-fast-whisper #253

You are attempting to use Flash Attention 2.0 with a model n…

I am using wsl. i have cuda 12.4 installed. I did: `pipx runpip insanely-fast-whisper install flash-attn --no-build-isolation` command ` insanely-fast-whisper --model-name "openai/whisper-large-v3…

solarslurpi updated 5 hours ago

jitsi/skynet #113

Suggestion: Use “Insanely Fast Whisper” for Enhanced Transcr…

I noticed that the current integration uses Faster Whisper for transcription. I would like to suggest replacing it with “Insanely Fast Whisper” for improved performance, especially in GPU-based enviro…

tuhinmallick updated 2 weeks ago

pytorch/pytorch #138907

Select SDPA backend smartly by `sdp_params` and benchmark re…

### 🚀 The feature, motivation and pitch Right now, the used SDPA backend is selected by a static priority order list: ``` python std::array priority_order(sdp_params const& params) { constexpr…

fno2010 updated 1 week ago

NVIDIA/TensorRT-LLM #2372

XQA kernel works slower with fp8 kv than with fp16 kv on H10…

### System Info Hi! I'm running speculative execution TRT-LLM engine with 4 or 5 generation length, and I noticed that fp8 kv cache attention works slower than fp16 kv cache attention. Would be grea…

ttim updated 1 week ago

theodorblackbird/lina-speech #11

updating for the training and inference code

I have just read your paper "LINA-SPEECH: GATED LINEAR ATTENTION IS A FAST AND PARAMETER-EFFICIENT LEARNER FOR TEXT-TO-SPEECH SYNTHESIS" and I must say, I am truly amazed by the effectiveness of your …

ScottishFold007 updated 1 week ago

pytorch-labs/attention-gym #56

Optimal ordering with block mask

From my understanding, flex attention (using `block_mask`) gets faster when the number of empty blocks is larger. If the inputs (Q, K, V) do not represent sequences, but graphs with local connectivity…

francois-rozet updated 2 days ago

lyuwenyu/RT-DETR #478

Question about Visualizing Self-Attention in RT-DETR Encoder

Hello lyuwenyu, First of all, thank you for your amazing work on RT-DETR! I’ve just started learning about object detection models, and I truly appreciate the innovations that make RT-DETR both fas…

Anchor1566 updated 1 week ago

NVIDIA/TensorRT-LLM #2135

trtllm-build with --fast-build ignore transformer layers

### System Info DGX H100 ### Who can help? when build engine with : ```sh trtllm-build --fast-build --model_config $model_cfg ``` and then benchmark with gptMangerBenchmark, it repo…

ZJLi2013 updated 1 month ago

1000+ results for fast-attention

1000+ results
for fast-attention