speculative-decoding Search Results

1000+ results
for speculative-decoding

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

mlc-ai/mlc-llm #1827

[Bug] Speculative decoding results in huge memory usage

## 🐛 Bug When attempting to test speculative decoding using the Speculative decoding predefined test, I get a huge memory usage which results in an OOM on my device ## To Reproduce Steps to r…

srikanthsrnvs updated 7 months ago
1
NVIDIA/TensorRT-LLM #1798

Medusa with Mixtral 8x7B

Hello! Does TensorRT-LLM supports Medusa with Mixtral 8x7B? My understanding is that right now the Medusa [convert_checkpoint.py](https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/medusa/c…

v-dicicco updated 2 months ago
12
vllm-project/vllm #6746

[Usage]: The 8xH100 device failed to run meta-llama/Meta-Lla…

### Your current environment ```text The output of `python collect_env.py` Collecting environment information... PyTorch version: 2.3.1+cu121 Is debug build: False CUDA used to build PyTorch: 12…

jueming0312 updated 1 month ago
24
vllm-project/vllm #5651

[Bug]: multiprocessing KeyError from `cache[rtype].remove(n…

### Your current environment ```text Collecting environment information... PyTorch version: 2.3.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A …

RobertFischer updated 3 months ago
4
mozilla/multi-account-containers #2567

Containers don't sync to new device, then the default config…

### Before submitting a bug report - [X] I updated to the latest version of Multi-Account Container and tested if I can reproduce the issue - [X] I searched for existing reports to see if it hasn't a…

use updated 4 months ago
2
vllm-project/vllm #7880

[Bug]: Special tokens not generated for GGUF when tensor_par…

### Your current environment The output of `python collect_env.py` ```text PyTorch version: 2.4.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A…

eirssan updated 1 month ago
3
open-compass/opencompass #1327

[Bug] KeyError: 'OpenICLInferTask is already registered in t…

### Prerequisite - [X] I have searched [Issues](https://github.com/open-compass/opencompass/issues/) and [Discussions](https://github.com/open-compass/opencompass/discussions) but cannot get the ex…

noforit updated 2 months ago
4
openvinotoolkit/openvino.genai #277

[Good First Issue]: causal_lm/cpp must read EOS token value …

### Context End Of Sequence tokens are an essential part of LLM training and inference. You can find more details in [this comment](https://discuss.huggingface.co/t/how-does-gpt-decide-to-stop-gene…

p-wysocki updated 5 months ago
12
vllm-project/vllm #4996

[Usage]: There is no response after the "GPU P2P capability …

### Your current environment ``` PyTorch version: 2.1.2+cu118 Is debug build: False CUDA used to build PyTorch: 11.8 ROCM used to build PyTorch: N/A OS: Ubuntu 16.04.7 LTS (x86_64) GCC versio…

wzz981 updated 3 months ago
2
jhc13/taggui #54

Model Request: Moondream

Please add support for this model. https://github.com/vikhyat/moondream An extra idea which may be feasible or unfeasible (I do not know) is maybe speculative decoding using a smaller model like th…

Goldenkoron updated 7 months ago
1

上一页 1...63 64 65 66 67 68 69...100 下一页

1000+ results for speculative-decoding

1000+ results
for speculative-decoding