speculative-decoding Search Results

1000+ results
for speculative-decoding

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

gpustack/gpustack #364

Model instance OOM due to large context length in OpenGVLab/…

**Describe the bug** **Steps to reproduce** 1. Deploy a model 'OpenGVLab/InternVL2-4B' from Hugging Face and configure the backend. parameter `--trust-remote-code`. 2. View the instance log…

Finenyaco updated 3 weeks ago
2
vllm-project/vllm #5001

[Bug]: 0.4.2 error on H20

### Your current environment ```text The output of `python collect_env.py` ``` root@9b33a89c3857:/workspace/vllm-0.4.2# python collect_env.py Collecting environment information... PyTorch versi…

tohneecao updated 2 months ago
13
aws-neuron/aws-neuron-sdk #921

Missing example in the doc for speculative decoding beta sup…

Hi team, Optimum Neuron is looking into adding speculative decoding support for some seq2seq models. There seems to be an example from the Annapurna team but the link to the resource is missing. C…

JingyaHuang updated 3 months ago
1
InternLM/lmdeploy #2470

[Bug/Feature] Keep-alive `\n` should be sent roughly every 3…

> Note: This is half bug (since it causes unnecessary errors in certain situations), and half feature request (since LMDeploy itself is not responsible for connection timeouts). I wasn't sure which to…

josephrocca updated 1 month ago
8
SafeAILab/EAGLE #116

Will smaller models be supported?

For example, 1.1B tinyllama.

Puvoka updated 2 months ago
1
PygmalionAI/aphrodite-engine #809

[Bug]: 0.6.3.post1 regression: RuntimeError during mem profi…

### Your current environment The output of `python env.py` ```text Collecting environment information... PyTorch version: 2.4.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1…

khanonnie updated 3 days ago
2
OpenNMT/CTranslate2 #1768

How to early stop an encoding call?

I am calling encode from whisperX/faster-whisper. Since encode can take 200ms in my use case, and I am calling it very often for many users, I would like for the ability to do early stopping in the…

mariano54 updated 1 month ago
3
dyninst/dyninst #1186

Assertion error when handling AArch64 binary: tried to acces…

**Intention** Disassemble the AArch64 binary. **Describe the bug** When I use dyninst to disassemble aarch64 binary which is striped, it incurs an assertion: `instructionAPI/src/aarch64_opcode_ta…

bin2415 updated 2 years ago
3
vllm-project/vllm #6038

[Bug]: Speculative decoding does not respect per-request see…

### Your current environment ```text Collecting environment information... PyTorch version: 2.3.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A …

tdoublep updated 3 months ago
2
vllm-project/vllm #7047

[Bug]: speculative decoding dies: IndexError: index 0 is ou…

### Your current environment ``` docker pull vllm/vllm-openai:latest docker run -d --restart=always \ --runtime=nvidia \ --gpus '"device=1"' \ --shm-size=10.24gb \ -p 5001:500…

pseudotensor updated 2 months ago
6

上一页 1...40 41 42 43 44 45 46...100 下一页

1000+ results for speculative-decoding

1000+ results
for speculative-decoding