kv-cache-quantization Search Results

1000+ results
for kv-cache-quantization

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

undreamai/LLMUnity #202

Crash on AMD graphics card on Windows

### Describe the bug Crash with abort when trying to use AMD graphics card in editor Model is mistral-7b-instruct-v0.2.Q4_K_M.gguf ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon RX…

tempstudio updated 1 week ago
29
ggerganov/llama.cpp #9216

Bug: Unable to load models after ???

### What happened? After https://github.com/ggerganov/llama.cpp/commit/231cff5f6f1c050bcb448a8ac5857533b4c05dc7 I'm getting errors with my app, so I decided to test compiled releases - and unable t…

MaggotHATE updated 3 months ago
1
NVIDIA/TensorRT-LLM #2141

trtllm-build the Qwen2 checkpoint yields KeyError: 'Qwen2For…

Greetings, everyone. First of all, some specifications here: 1. docker version: nvcr.io/nvidia/tritonserver:24.07-trtllm-python-py3 2. trtllm-build --version: [TensorRT-LLM] TensorRT-LLM version:…

sdecoder updated 2 months ago
2
vllm-project/vllm #8693

[Bug]: Pixtral-12B not supported on CPU

### Your current environment The output of `python collect_env.py` ```text Collecting environment information... WARNING 09-21 15:29:13 _custom_ops.py:18] Failed to import from vllm._C with Im…

joelimgu updated 1 month ago
8
vllm-project/vllm #7107

[Bug]: ValueError: The number of required GPUs exceeds the t…

### Your current environment [root@localhost wangjianqiang]# python -m vllm.entrypoints.openai.api_server --model /root/wangjianqiang/deepseek-moe/deepseek-coder-33b-base/ --tensor-parallel-size 8 …

WangJianQ-0118 updated 1 hour ago
3
OpenBMB/llama.cpp #23

random text instead of answering the question

I downloaded the model here and ran it on the fork, but the model is writing a lot of random text instead of answering the question "What is the process number?" https://huggingface.co/openbmb/Mini…

insinfo updated 2 months ago
1
ollama/ollama #6669

Ubuntu GPU not used

### What is the issue? The GPU is not used when using Start log ``` ollama start 2024/09/06 06:40:42 routes.go:1125: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VI…

Andrii-suncor updated 2 months ago
1
vllm-project/vllm #3872

[Bug]: Does vLLM support Qwen/Qwen1.5-32B-Chat-AWQ? It works…

### Your current environment vllm docker image: vllm/vllm-openai:latest ### 🐛 Describe the bug It works for the first time then stops generating responses, as shown below. ChatCompletion(id='c…

sungkim11 updated 6 days ago
16
vllm-project/vllm #6558

[Bug]: Cannot load fp8 model of internlm2-chat-7b offline

### Your current environment ```text PyTorch version: 2.3.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A OS: Debian GNU/Linux 11 (bullseye) (x86…

EstellaXinyuZhang updated 1 week ago
5
mlc-ai/mlc-llm #2890

Process is going to kill itself!

ndroid/mlc4j/../../3rdparty/tvm/src/runtime/relax_vm/paged_kv_cache.cc:2650: Check failed: (args.size() == 22 || args.size() == 23) is false: Invalid number of KV cache constructor args. 2024-09-10 2…

Vinaysukhesh98 updated 1 month ago
17

上一页 1...94 95 96 97 98 99 100...100 下一页

1000+ results for kv-cache-quantization

1000+ results
for kv-cache-quantization