kv-cache-quantization Search Results

1000+ results
for kv-cache-quantization

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

vllm-project/vllm #6558

[Bug]: Cannot load fp8 model of internlm2-chat-7b offline

### Your current environment ```text PyTorch version: 2.3.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A OS: Debian GNU/Linux 11 (bullseye) (x86…

EstellaXinyuZhang updated 1 week ago
5
mlc-ai/mlc-llm #2890

Process is going to kill itself!

ndroid/mlc4j/../../3rdparty/tvm/src/runtime/relax_vm/paged_kv_cache.cc:2650: Check failed: (args.size() == 22 || args.size() == 23) is false: Invalid number of KV cache constructor args. 2024-09-10 2…

Vinaysukhesh98 updated 1 month ago
17
likelovewant/ollama-for-amd #17

When I try `ollama run llama3.1:70b`, occur error `Error: ll…

### What is the issue? When I try `ollama run llama3.1:70b`, occur error `Error: llama runner process has terminated: error loading model: unable to allocate backend buffer` ``` C:\Users\sol>olla…

EC-Sol updated 2 months ago
1
vllm-project/vllm #3291

v0.3.3 api server can't startup with neuron sdk

————————env lib detail:—————————————— inf2.24xlarge ubuntu@ip-172-31-12-212:~/vllm$ pip list|grep -i neuron aws-neuronx-runtime-discovery 2.9 libneuronxla 2.0.755 neuro…

qingyuan18 updated 5 days ago
2
ollama/ollama #7311

ollama 0.4.0-rc3: deepseek-coder-v2-lite is not functioning …

### What is the issue? I encountered an error while attempting to run both q8_0 and q4_k_m. `Error: llama runner process has terminated: error loading model: error loading model vocabulary: wstrin…

antonovkz updated 1 month ago
11
vllm-project/vllm #4858

[Bug]: No CUDA GPUs are available on 'CPU' use

### Your current environment ```text PyTorch version: 2.1.2+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A OS: Ubuntu 22.04.4 LTS (x86_64) GCC …

mcr-ksh updated 1 week ago
4
vllm-project/vllm #10204

[Usage]: cannot load GGUF model on multi GPU

### Your current environment ```text The output of `python collect_env.py` WARNING 11-10 22:34:11 cuda.py:76] Detected different devices in the system: WARNING 11-10 22:34:11 cuda.py:76] NVIDIA G…

kbfifi updated 3 weeks ago
2
abetlen/llama-cpp-python #1372

CUDA error: invalid device function

# Prerequisites ROCm 6 # Expected Behavior Attempting to utitilize llama_cpp_python in OobaBooga Webui # Current Behavior It loads the model into VRAM. Then upon trying to infer; gml…

cornpo updated 4 weeks ago
4
ollama/ollama #5361

Ollama running very slow on Windows

### What is the issue? I have pulled a couple of LLMs via Ollama. When I run any LLM, the response is very slow – so much so that I can type faster than the responses I am getting. My system speci…

AbhisheakSaraswat updated 1 week ago
20
janhq/jan #3087

bug: Image recognition

### # - [ ] I have searched the existing issues ### Current behavior error log below btw. same model and same mmproject-file works with koboldcpp , may you can copy paste ;) ### Minimum repro…

kalle07 updated 1 month ago
6

上一页 1...94 95 96 97 98 99 100...100 下一页

1000+ results for kv-cache-quantization

1000+ results
for kv-cache-quantization