speculative-decoding Search Results

1000+ results
for speculative-decoding

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

vllm-project/vllm #7580

[Bug]: run quantized model error

### Your current environment The output of `python collect_env.py` ```Collecting environment information... PyTorch version: 2.4.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1…

soulzzz updated 2 months ago
3
vllm-project/vllm #6558

[Bug]: Cannot load fp8 model of internlm2-chat-7b offline

### Your current environment ```text PyTorch version: 2.3.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A OS: Debian GNU/Linux 11 (bullseye) (x86…

EstellaXinyuZhang updated 2 weeks ago
4
vllm-project/vllm #9714

[Feature]: AttributeError: Model MllamaForConditionalGenerat…

### Your current environment Collecting environment information... PyTorch version: 2.4.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A OS: Ubunt…

CyrusCY updated 1 week ago
1
vllm-project/vllm #5540

[Feature]: LoRA support for Mixtral GPTQ and AWQ

### 🚀 The feature, motivation and pitch Please consider adding support for GPTQ and AWQ quantized Mixtral models. I guess that after #4012 it's technically possible. ### Alternatives _No r…

StrikerRUS updated 1 month ago
6
intel/xFasterTransformer #476

Illegal instruction (core dumped)

my env is: os：centos7.9 cpu： Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 20 On-line CPU(s) list: 0-19 Thre…

wcollin updated 1 month ago
1
vllm-project/vllm #3778

[Bug]: Qwen-14B-Chat-Int4 with guided_json error

### Your current environment ```text Collecting environment information... PyTorch version: 2.1.2+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A …

xunfeng1980 updated 2 months ago
4
QwenLM/Qwen2-VL #96

vllm推理报错:无法在rope_scaling中获取factor字段

这是我的运行代码： python -m vllm.entrypoints.openai.api_server --served-model-name Qwen2-VL-7B-Instruct --model /home/wangll/llm/model_download_demo/models/Qwen/Qwen2-VL-7B-Instruct 以下是报错信息： INFO 09-03 1…

Potato-wll updated 1 week ago
33
sgl-project/sglang #157

Development Roadmap (Deprecated)

## Function Calling - Frontend - Add `tools` argument in `sgl.gen`. See also guidance [tools](https://github.com/guidance-ai/guidance/blob/d1bbe1c698cbb201f89556d71193993e78c0686b/README.md?plai…

Ying1123 updated 4 days ago
17
vllm-project/vllm #5035

[Bug]: 英伟达最新驱动555.85，vllm运行报错

`2024-05-24 23:49:38 WARNING 05-24 15:49:38 utils.py:327] Not found nvcc in /usr/local/cuda. Skip cuda version check! 2024-05-24 23:49:38 INFO 05-24 15:49:38 config.py:379] Using fp8 data type to sto…

gaye746560359 updated 1 week ago
8
vllm-project/vllm #9153

[Bug]: InternVL bounding box prediction does not work

### Your current environment The output of `python collect_env.py` ```text python collect_env.py Collecting environment information... PyTorch version: 2.4.0+cu121 Is debug build: False C…

MoritzLaurer updated 1 week ago
14

上一页 1...28 29 30 31 32 33 34...100 下一页

1000+ results for speculative-decoding

1000+ results
for speculative-decoding