-
### What is the issue?
Ollama using Docker mode.
When execute 'sudo docker exec -it ollama ollama run nemotron:latest',
or "sudo docker exec -it ollama ollama run qwen2.5:72b"
it replied "GGGGGGG…
-
I am working on a use case of loading a model with parallel gpus, then unloading the model, and loading a new model in the same process.
```
@classmethod
async def unload_models(cls, exiting=…
-
Grettings to all
## 🐛 Bug
## To Reproduce
Steps to reproduce the behavior:
1. python3 -m mlc_llm serve HF://mlc-ai/Llama-3.1-70B-Instruct-q3f16_1-MLC --overrides "tensor_parallel_shard…
-
# Current Behavior
I run the following:
CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python --verbose
an error occured:
ERROR: Failed building wheel for llama-cpp-python
# Environment …
-
### What is the issue?
When using the llm benchmark with ollama https://github.com/MinhNgyuen/llm-benchmark , I get around 80 t/s with gemma 2 2b. When asking the same questions to llama.cpp in conve…
-
**Current State**
Defillama API pulls live in two places:
- Chain Level: https://github.com/ethereum-optimism/op-analytics/blob/main/other_chains_tracking/chain_tvl_trends.ipynb
- Full Token Break…
-
The llama.cpp project already has an option to add `-pg` option with `LLAMA_GPROF=1`.
But it gets crashed when `llama-cli` is traced with uftrace as follows.
```
$ git clone https://github.com/gg…
-
### What is the issue?
Run [smollm](https://ollama.com/library/smollm:135m) got cuda error
Step:
1. ollama run smollm:135m
2. Input any text
```
Error: an unknown error was encountered while…
-
### Your current environment
vLLM Version: 0.6.3.post2.dev256+g4be3a451
The output of `python collect_env.py`
```text
Collecting environment information... …
-
### Checklist
- [X] 1. I have searched related issues but cannot get the expected help.
- [X] 2. The bug has not been fixed in the latest version.
- [X] 3. Please note that if the bug-related iss…