-
### System Info
- Host: VMware ESXi 7
- Host Nvidia drivers: 550.54.16
- VM CPU architecture: x86_64
- VM Nvidia drivers: 550.54.15
- VM OS: Ubuntu LTS 22.04
- Physical GPU: A100
- TensorRT-LLM…
-
### What happened?
command of compilation:
` cmake .. -DGGML_VULKAN=ON -DCMAKE_BUILD_TYPE=Release`
command of running:
`./bin/llama-cli -m ggml-model-q4_k.gguf -c 512 -b 1024 -n 256 --keep 48 -…
-
HOST安装的步骤
conda create -n llm python=3.11
conda activate llm
# below command will install intel_extension_for_pytorch==2.1.10+xpu as default
pip install --pre --upgrade ipex-llm[xpu] --extra-index…
-
During the process of fine-tuning LLama3 using LLama.cpp on my Mac, I encountered this error. I'm a beginner and don't know what caused this issue. I hope an expert can help me.
The model used is: …
-
I just upgraded to the latest ollama to verify the issue and it it still present on my hardware
I am running version 0.1.25 and trying to run the falcon model
Warning: could not connect to a ru…
-
Running vllm according to instructions. Docker segfaults at startup, so I'm running straight on the machine.
Starting server with the following shell script. As you can see I've tried to turn max…
-
### What is the issue?
Hi
I have a little problem, ill try to run the model that i downloaded, but it not start.
I try on many ways:
ollama run qwen2:72b-instruct --verbose
also I try with:…
-
I'm getting this error when using --quantkv with Metal.
```
GGML_ASSERT: ggml-metal.m:924: !"unsupported op"
```
>python3.11 koboldcpp.py Mistral-7B-Instruct-v0.3-Q8_0.gguf --nommap --flashattenti…
-
Hello TensorRT-LLM experts!
I have a question regarding the weird operation of the XQA kernel function supported in NVIDIA's official MLPerf 4.0 version of TensorRT-LLM.
First of all, I want to te…
-
### What happened?
I'm in the process of experimenting with RPC using a fresh builds from ~today and I'm seeing some things that appear at first sight to be bugs and also perhaps just lacking suppo…