speculative-decoding Search Results

1000+ results
for speculative-decoding

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

huggingface/tgi-gaudi #197

Integration of llama3.1 fixes

Quick question: when is an update to an optimum-habana version which includes https://github.com/huggingface/optimum-habana/issues/1154 (fix for rope_scaling @ llama3.1 family) planned?

Feelas updated 4 weeks ago
11
vllm-project/vllm #6306

[Bug]: "Prompt logprob is not supported by multi step worker…

### Your current environment ```text Collecting environment information... PyTorch version: 2.3.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A …

ccdv-ai updated 1 week ago
3
NVIDIA/TensorRT-LLM #904

Error: enable KV cache block reuse

### System Info TRT-LLM 0.7.1 Host: g5.12xlarge ec2 instance(A10G) Memory size: 23028MiB CUDA: 12.2 Model: GPT2 ### Who can help? _No response_ ### Information - [X] The official example scri…

ydm-amazon updated 7 months ago
1
run-llama/llama_index #13925

[Question]: how to use the llamaindex+vllm correctly?

### Question Validation - [X] I have searched both the documentation and discord for an answer. ### Question I install the llamaindex with the command `pip install llama-index` and install t…

lambda7xx updated 1 month ago
12
aws-neuron/aws-neuron-sdk #921

Missing example in the doc for speculative decoding beta sup…

Hi team, Optimum Neuron is looking into adding speculative decoding support for some seq2seq models. There seems to be an example from the Annapurna team but the link to the resource is missing. C…

JingyaHuang updated 1 month ago
1
huggingface/transformers #31183

Speculative Decoding for chunked audios

### System Info Transformers Version: 4.42.0 Python environment: 3.10.14 ### Who can help? @sanchit-gandhi ### Information - [ ] The official example scripts - [X] My own modified scripts ### …

devansh-shah-11 updated 1 month ago
4
vllm-project/vllm #7960

[Bug]: segfault when loading MoE models

### Your current environment ### Anything you want to discuss about vllm. ### Your current environment The output of `python collect_env.py` ```text PyTorch version: 2.4.0+cu121 Is debug…

nivibilla updated 1 day ago
22
THUDM/LongWriter #22

json.decoder.JSONDecodeError: Expecting value: line 1 column…

### System Info / 系統信息 CUDA Version: 12.2 transformers Version: 4.44.2 Python: 3.12.4 Operating system: Windows Subsystem for Linux (WSL) in VS Code ### Who can help? / 谁可以帮助到您？ _No response_ #…

Vanessa-Taing updated 1 day ago
2
vllm-project/vllm #8116

[Bug]: Loading GPTQ-quantized GPTBigCode fails in weight_loa…

### Your current environment The output of `python collect_env.py` ```text Collecting environment information... PyTorch version: 2.4.0+cu121 Is debug build: False CUDA used to build PyTor…

maxdebayser updated 4 hours ago
5
vllm-project/vllm #6610

[Performance]: Multi-node Pipeline Parallel double bandwidth…

### Misc discussion on performance I've been running some simple tests on multi-node parallel pipeline with NCCL. I doubled the bandwidth between the nodes but saw no increase in t/s or throughput.…

drikster80 updated 1 month ago
4

上一页 1...18 19 20 21 22 23 24...100 下一页

1000+ results for speculative-decoding

1000+ results
for speculative-decoding