speculative-decoding Search Results

808 results
for speculative-decoding

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

turboderp/exllamav2 #397

ROCm Flash-Attention 2

I have been informed that while Flash Attention's there it's not being used - https://github.com/oobabooga/text-generation-webui/issues/3759#issuecomment-2031180332 The post has a link to what has …

nktice updated 3 months ago
1
predibase/lorax #292

LightLLM supports Multimodal llm, when will lorax?

### Model description https://github.com/ModelTC/lightllm/pull/266 Will there be vision llm support in Lorax soon? ### Open source status - [X] The model implementation is available - [X] The mo…

Trickster85 updated 4 months ago
1
vllm-project/vllm #5566

[Usage]: how to use marlin kernel for GPTQ model

### How would you like to use vllm I want to run inference of a [TheBloke/Llama-2-7B-Chat-GPTQ](https://huggingface.co/TheBloke/Llama-2-7B-Chat-GPTQ). I don't know how to use it with vllm. I try t…

George-ao updated 2 weeks ago
9
vllm-project/vllm #1023

Can vLLM support medusa head?

https://sites.google.com/view/medusa-llm

MichaelJayW updated 3 months ago
3
unslothai/unsloth #464

AWQ support

I have faced an error with the VLLM framework when I tried to inferencing an Unsloth fine-tuned LLAMA3-8b model... ### Error: (venv) ubuntu@ip-192-168-68-10:~/ans/vllm-server$ python -O -u -m vl…

anslin-raj updated 1 day ago
11
huggingface/distil-whisper #6

english only for large-v2?

Wondering if the statement on the readme is correct "drop-in replacement for Whisper on English speech recognition" - does this mean even large-v2 model is english only? Thanks!

ostegm updated 6 months ago
10
meta-llama/llama #256

An ingenious way to speed up inference! 🚀

I thought of a way to speed up inference by using batches. This assumes that you can run a batch of 2 faster much than you can run 2 passes. So it will work with GPUs with a lot of compute cores or mu…

pauldog updated 9 months ago
3
foundation-model-stack/foundation-model-stack #8

LLaMA module names

For modules in LLaMA, we currently have: LLaMA - llama model with lm head LLaMAStack - the decoder layers + dec_norm LLaMABlock - Self attention + ff I am wondering if maybe we should change t…

JRosenkranz updated 9 months ago
3
vllm-project/vllm #6071

[Gemma 2 27B]: Update docker hub image to support gemma-2-27…

### The model to consider. I am trying to run docker image of vllm for gemma-2-27B-it, But facing architectures not recognized error. error: ValueError: The checkpoint you are trying to load has …

vipulgote1999 updated 14 hours ago
1
rh-aiservices-bu/llm-on-openshift #64

Multi GPU setup for VLLM in Openshift does not work

Hi, so tried using your[ deployment.yaml](https://github.com/rh-aiservices-bu/llm-on-openshift/blob/main/llm-servers/vllm/gitops/deployment.yaml); however while the single GPU instance works, multi GP…

jayteaftw updated 2 weeks ago
2

上一页 1...6 7 8 9 10 11 12...81 下一页

808 results for speculative-decoding

808 results
for speculative-decoding