speculative-decoding Search Results

1000+ results
for speculative-decoding

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

ROCm/flash-attention #35

Merge to upstream flash-attention repo

I am requesting that you merge with the upstream flash-attention repo, in order to garner community engagement and improving integration and distribution. This separation is a major blocker to AMD …

ehartford updated 1 month ago
11
vllm-project/vllm #4455

[Usage] [Bug]: run inference on mistralai/Mixtral-8x7B-Inst…

### Your current environment ```text Collecting environment information... PyTorch version: 2.1.2+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A …

jayteaftw updated 1 week ago
2
pytorch/pytorch #130791

An additional dimension appears in the return when using tor…

### 🐛 Describe the bug There is an additional dimension appearing in the second return value of a `torch.nn.LSTM` layer when we apply `torch.compile` (in all backends) to it. The additional dimension…

tingyangk updated 3 months ago
1
vllm-project/vllm #5563

[Bug]: Speculative decoding server: `ValueError: could not b…

### Your current environment ```text Collecting environment information... PyTorch version: 2.3.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A …

jeffreyling updated 3 weeks ago
18
vllm-project/vllm #9069

[Bug]: Issue with Pixtral Model: Unsupported Vision Configur…

### Your current environment Issue with Pixtral Model: Unsupported Vision Configuration in vLLM (AMD Radeon 7900 XTX) I am trying to load the Pixtral model from Hugging Face (specifically, mistr…

matrix1233 updated 3 weeks ago
1
vllm-project/vllm #6572

[Bug]: INFO 07-19 10:17:50 async_llm_engine.py:167] Aborted …

Collecting environment information... PyTorch version: 2.3.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A OS: Ubuntu 20.04.6 LTS (x86_64) GCC ve…

Adevils updated 1 week ago
4
opea-project/GenAIComps #355

GraphRAG

kevinintel updated 1 day ago
2
ml-explore/mlx-examples #976

is mlx-examples/llms/CONTRIBUTING.md still up-to-date?

Just looking to confirm that the markdown files still have up-to-date instructions for us new folks. It looks like the last update was 6 months ago. Mainly this file: `mlx-examples/llms/CONTRIBUTI…

Jonathan-Dobson updated 2 months ago
1
vllm-project/vllm #2729

Assertion `!(srcMmaLayout && dstMmaLayout) && "Unexpected mm…

When executing script `examples/offline_inference_with_prefix.py`, it will call `context_attention_fwd` from `vllm.model_executor.layers.triton_kernel.prefix_prefill`, which triggered the following er…

gty111 updated 1 month ago
21
vllm-project/vllm #10102

[Bug]: Engine loop has died for Meta-Llama-3.1-8B-Instruct T…

### Your current environment The output of `python collect_env.py` ```text PyTorch version: 2.4.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N…

HaoyuWang4188 updated 1 day ago
1

上一页 1...32 33 34 35 36 37 38...100 下一页

1000+ results for speculative-decoding

1000+ results
for speculative-decoding