kv-cache-quantization Search Results

1000+ results
for kv-cache-quantization

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

mlc-ai/mlc-llm #2700

[Bug] tvm._ffi.base.TVMError: TVMError: Assert fail: T.Cast(…

## 🐛 Bug I am trying to optimise the `Qwen/Qwen1.5-4B-Chat` model. As I have only 8GB RAM on my MAC M1, I use 3bit quantisation and a really small prefill chunk size = 2048. I get the following err…

pra-dan updated 2 months ago
1
deepjavalibrary/djl-serving #2502

Serving failed for model "microsoft/Phi-3-vision-128k-instru…

## Description Tried to serve model "microsoft/Phi-3-vision-128k-instruct" with several LMI images and deploy to sagemaker but failed with errors. ### Expected Behavior Expect the sagemaker endpo…

n0thing233 updated 1 month ago
1
vllm-project/vllm #1206

GPU KV cache usage: 100.0%以后就卡住

GPU KV cache usage: 100.0%以后就卡住，GPU使用率也将为0，无法继续提供服务，请问有什么解决办法吗？

wgx7054 updated 3 days ago
21
NVIDIA/TensorRT-LLM #882

Parallelization issues

Hello, when I run run.py, ``` mpirun -n 2 --allow-run-as-root \ python3 run.py --max_output_len=1024 \ --tokenizer_dir /root/autodl-tmp/llama-2-7b \ --engine_dir=/…

isRambler updated 14 hours ago
5
ggerganov/llama.cpp #9356

Bug: llama-server crashing after refactor sampling v2 pull

### What happened? llama-server crashes after prompt processing, issue doesn't occur before https://github.com/ggerganov/llama.cpp/commit/df270ef74596da8f1178f08991f4c51f18c9ee82 At first I thought…

eskeletor97 updated 2 months ago
6
xorbitsai/inference #2389

0.15.3跟qwen2_5-instruct-gptq-72b-Int8不兼容

### System Info / 系統信息 **cuda: 12.3** **os:Ubuntu22.04** **Python:3.11.9** **pip list:** vllm 0.6.2 vllm-flash-attn 2.6.1 xinference …

monk-after-90s updated 1 month ago
3
vllm-project/vllm #6445

[Bug]: TypeError: 'NoneType' object is not callable when sta…

### Your current environment Collecting environment information... PyTorch version: 2.3.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A OS: Cen…

candowu updated 3 days ago
10
HabanaAI/vllm-fork #405

[Bug]: `--enable-lora` raises error while trying to start ap…

### Your current environment The output of `python collect_env.py` ```text Collecting environment information... /home/irteamsu/miniconda3/envs/jongho/lib/python3.10/site-packages/torch/dist…

JHLEE17 updated 1 month ago
5
hiyouga/LLaMA-Factory #5368

LORA微调chatglm3-6b报错：ValueError: Target modules {'v_proj', 'g…

### Reminder - [X] I have read the README and searched the existing issues. ### System Info ValueError: Target modules {'v_proj', 'gate_proj', 'k_proj', 'o_proj', 'down_proj', 'q_proj', 'up_proj'} …

kaixuanjames updated 3 months ago
1
sgl-project/sglang #1985

[Bug] Run llava 1.5 backend get an error

### Checklist - [x] 1. I have searched related issues but cannot get the expected help. - [X] 2. The bug has not been fixed in the latest version. - [X] 3. Please note that if the bug-related iss…

pspdada updated 2 weeks ago
1

上一页 1...87 88 89 90 91 92 93...100 下一页

1000+ results for kv-cache-quantization

1000+ results
for kv-cache-quantization