speculative-decoding Search Results

1000+ results
for speculative-decoding

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

vllm-project/vllm #8613

[Bug]: vllm deploy qwen1.5-14b/qwen2-7b+medusa, RuntimeError…

### Your current environment vllm=0.6.1 ### Model Input Dumps CUDA_VISIBLE_DEVICES=7 python3 -m vllm.entrypoints.openai.api_server --port 8010 \ --served-model-name qwen2-7b \ --model /mn…

xhjcxxl updated 1 month ago
1
vllm-project/vllm #6306

[Bug]: "Prompt logprob is not supported by multi step worker…

### Your current environment ```text Collecting environment information... PyTorch version: 2.3.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A …

ccdv-ai updated 2 months ago
3
fullcontact/hadoop-sstable #26

Running into com.fullcontact.cassandra.io.sstable.CorruptBlo…

I am trying to decode some sstable files with following two steps:- Indexing step:- hadoop jar hadoop-sstable-0.1.4.jar com.fullcontact.sstable.index.SSTableIndexIndexer /data//cassandra-data Decodi…

ggjoshi updated 9 years ago
9
Emerging-AI/ENOVA #36

Unexpected GPU Memory Usage Spike During Model Loading

#### Description: I am experiencing an unexpected spike in GPU memory usage when loading the `Meta-Llama-3.1-8B-Instruct-AWQ-INT4` model using the vLLM framework. Initially, the GPU memory usage is…

shiertier updated 2 weeks ago
1
SYSTRAN/faster-whisper #771

Speculative Decoding

1. Is speculative decoding faster than faster-whisper? 2. Is there going to be support anytime soon for speculative decoding in faster-whisper? Both of these questions are asked with a purely…

RohitMidha23 updated 4 months ago
3
sgl-project/sglang #1105

[Develop] Performance Improving Feature

I want to develop some features based on Sglang to improve the performance of srt. 1. A new scheduler of ControllerMulti that can more accurately identify the resource utilization of each instance a…

yukavio updated 1 week ago
5
vllm-project/vllm #7315

[Feature]: Support attention backend with FlexAttention

### 🚀 The feature, motivation and pitch FlexAttention was proposed as a performant attention implementation leveraging `torch.compile` with easy APIs for adding support for complex attention varian…

mgoin updated 3 weeks ago
8
huggingface/transformers #31183

Speculative Decoding for chunked audios

### System Info Transformers Version: 4.42.0 Python environment: 3.10.14 ### Who can help? @sanchit-gandhi ### Information - [ ] The official example scripts - [X] My own modified scripts ### …

devansh-shah-11 updated 4 months ago
4
ray-project/ray #32778

Issue on page /index.html /How to make MIG GIs visible in a …

Dear members of the Ray team, I am working with DRL algorithms using rllib. I am configuring and testing multiple experiments using the Tune API (tune.run()) as well as the different implemented DR…

ChristosPeridis updated 5 months ago
4
vllm-project/vllm #8330

[Bug]: vLLM v0.6.0 (CPU) server failed to start on setting …

### Your current environment vLLM version: v0.6.0 (CPU) CPU: AMD EPYC 9654 ### 🐛 Describe the bug vLLM v0.6.0 (CPU) server failed to start on setting VLLM_CPU_OMP_THREADS_BIND as shown below: …

jerin-scalers-ai updated 1 month ago
2

上一页 1...38 39 40 41 42 43 44...100 下一页

1000+ results for speculative-decoding

1000+ results
for speculative-decoding