speculative-decoding Search Results

1000+ results
for speculative-decoding

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

vllm-project/vllm #8276

[Usage]: multi image inference for "OpenGVLab/InternVL2-8B" …

multi image inference for "OpenGVLab/InternVL2-8B" not working I got this inference code from here https://github.com/vllm-project/vllm/blob/main/examples/offline_inference_vision_language_multi_…

dahwin updated 1 month ago
6
vllm-project/vllm #5785

[Bug]: vLLM 0.4.2 8xH100 init failed

### Your current environment **environment:** vllm 0.4.2 python3.10 cuda11.8 cpu: 52 mem: 375Gi **model:** llama3-70B ### 🐛 Describe the bug **description**: vLLM engine init failed, w…

xiejibing updated 1 month ago
5
PygmalionAI/aphrodite-engine #566

[Bug]: Cannot send more than 1 request at a time when using …

### Your current environment ```Collecting environment information... PyTorch version: 2.3.0 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A OS: Ubuntu 2…

Nero10578 updated 2 months ago
1
johnfanv2/LenovoLegionLinux #76

Legion Slim 5 16APH8 Support

Model name: Lenovo Legion Slim 5 16APH8 CPU model: AMD Ryzen 7 7840HS GPU model: NVIDIA RTX 4060 Mobile Keyboard backlight: RGB OS: Archlinux Output of `sudo dmidecode -t system`. Please remove…

brett-owen updated 1 month ago
30
huggingface/transformers #29525

custom 4d attention masks broken by #28937

### System Info The 4.38.2 version breaks code using custom 4d attention masks (introduced in #27539). Apparently, the custom masks gets replaced here:https://github.com/huggingface/transformers/bl…

poedator updated 7 months ago
3
AntelopeIO/spring #795

Unable to load snapshot recovery node

OS Version：`Ubuntu 22.04 Docker` Spring Version: `v1.0.1` Snapshot Version: `v8` Download Snapshot: ```bash wget -O /tmp/snapshot-v8-latest.bin https://snapshots.eosnation.io/eos-v8/latest ```…

YuXiaoCoder updated 1 month ago
3
Joshua-Riek/ubuntu-rockchip #965

Bug Report: ubuntu-22.04 OrangePi3b File Missing rk3566-oran…

### What happened? Cannot Boot the image. Its not related to NVME or SDCARD boot but missing `.dtb` file Process followed: [official](http://www.orangepi.org/orangepiwiki/index.php?title=Orange_Pi_3…

defencedog updated 1 month ago
10
bentoml/OpenLLM #1009

bug: Cannot Run an OpenLLM server regardless of where I try …

### Describe the bug I recently tried using openllm to connect to llama and it would give me some bentoml config errors. I'm not sure if its because I don't have a GPU but I didn't see any evidence o…

Said-Ikki updated 4 months ago
6
pytorch-labs/gpt-fast #86

How is llama-7b trained, what is the verification accuracy?

Hi, I am wondering about the training process of the small model and the verification accuracy. As it has large effects on the decoding effectiveness. Thank you!

frankxyy updated 8 months ago
2
QwenLM/Qwen2.5 #458

VLLM部署Qwen1.5-32B-Chat-GPTQ-Int4，总是说CUDA out of memory，但实际cu…

运行代码 ```python from transformers import AutoTokenizer from vllm import LLM, SamplingParams import torch model_path2 = "/home/xxx/llm/Qwen1.5-32B-Chat-GPTQ-Int4" # Initialize the tokenizer tok…

xiaozhi-agent updated 4 months ago
2

上一页 1...91 92 93 94 95 96 97...100 下一页

1000+ results for speculative-decoding

1000+ results
for speculative-decoding