speculative-decoding Search Results

1000+ results
for speculative-decoding

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

PygmalionAI/aphrodite-engine #501

[Bug]: Cannot start GGUF FP16 models

### Your current environment ```Collecting environment information... PyTorch version: 2.3.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A OS: Ubun…

Nero10578 updated 3 months ago
4
vllm-project/vllm #6700

[Bug]: vLLM 0.5.3 is getting stuck at LLAMA 3.1 405B FP8 mod…

### Your current environment ```text The output of `python collect_env.py` ``` ``` PyTorch version: 2.3.1+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTo…

lanking520 updated 1 month ago
12
NVIDIA/nim-anywhere #15

RuntimeError: CUDA error: no kernel image is available for e…

I followed the documentation to build the LLaMA 3 8B Instruct model with multiple LoRA versions as described in this NVIDIA blog post(https://developer.nvidia.com/zh-cn/blog/deploy-multilingual-llms-w…

JIA-HONG-CHU updated 1 month ago
1
vllm-project/vllm #6775

[Bug]: Reproducing Llama 3.1 distributed inference from the …

### Your current environment ```text The output of `python collect_env.py` Collecting environment information... PyTorch version: 2.3.1+cu121 Is debug build: False CUDA used to build PyTorch: 12…

eldarkurtic updated 1 month ago
4
vllm-project/vllm #6429

[Bug]: illegal memory access when increase max_model_length …

### Your current environment ```text # Using pip install vllm vllm==v0.5.1 ``` ### 🐛 Describe the bug ```text # My python script to test long text def run_Mixtral(): tokenizer = A…

IEI-mjx updated 1 week ago
19
pytorch/data #1161

Crash when resetting datapipe with bufffer of filehandles

### 🐛 Describe the bug When the datapipe iterator is reset, the multiprocessing reading service tries to pickle the datapipe (why?). In case the data pipe contains a buffer with file handles this fai…

falckt updated 12 months ago
3
openvinotoolkit/openvino.genai #277

[Good First Issue]: causal_lm/cpp must read EOS token value …

### Context End Of Sequence tokens are an essential part of LLM training and inference. You can find more details in [this comment](https://discuss.huggingface.co/t/how-does-gpt-decide-to-stop-gene…

p-wysocki updated 5 months ago
12
vllm-project/vllm #6462

[Bug]: Can't load gemma-2-9b-it with vllm 0.5.2

### Your current environment ```text PyTorch version: 2.3.1+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A OS: RED OS release MUROM (7.3.4) Stan…

vlsav updated 1 day ago
38
jhc13/taggui #54

Model Request: Moondream

Please add support for this model. https://github.com/vikhyat/moondream An extra idea which may be feasible or unfeasible (I do not know) is maybe speculative decoding using a smaller model like th…

Goldenkoron updated 6 months ago
1
vllm-project/vllm #5152

[Bug] [spec decode] [flash_attn]: CUDA illegal memory access…

### My environment setup 1st environment (running on ec2 `g6.4xlarge`) ``` [2024-06-01T10:14:23Z] Collecting environment information... [2024-06-01T10:14:26Z] PyTorch version: 2.3.0+cu121 [2024-0…

khluu updated 1 month ago
5

上一页 1...57 58 59 60 61 62 63...100 下一页

1000+ results for speculative-decoding

1000+ results
for speculative-decoding