[BUG] OLLAMA Crash request

QIN2DIM commented 1 month ago

Before submitting your bug report

[X] I believe this is a bug. I'll try to join the Continue Discord for questions
[X] I'm not able to find an open issue that reports the same bug
[X] I've seen the troubleshooting guide on the Continue Docs

Relevant environment info

- OS: win11
- Continue: v0.0.49
- IDE: Pycharm Community 2024.1.1
- Ollama: v0.139

Description

When the configuration of a model object is written like this, what is the value of the num_gpu in the request parameter? Still not set?

{
    "models": [
    {
      "title": "CodeQwen",
      "provider": "ollama",
      "model": "codeqwen:7b",
      "apiBase": "http://192.168.1.180:11434",
      "contextLength": 65536,
      "completionOptions": {
        "temperature": 0.2,
        "topP": 0.9,
        "maxTokens": 4096
      }
    }
   ]
}

I found that each time I will make my Ollama Reload model, and it will run the reasoning task in a pure 100%CPU PROCESSOR, that is, all the network layers are not uninstalled on the GPU.

In other words, although I have loaded a 100% GPU Processor model in Ollama, this request still allows Ollama to overload the model.

Therefore, the reasoning task is very, very slow

To reproduce

No response

Log output

No response

QIN2DIM commented 1 month ago

OLLAMA service config


# ... ↓ keypoint

[Service]
Environment="OLLAMA_KEEP_ALIVE=3h"
Environment="OLLAMA_NUM_PARALLEL=10"
Environment="OLLAMA_MAX_LOADED_MODELS=6"
Environment="OLLAMA_MAX_QUEUE=128"
Environment="OLLAMA_DEBUG=1"
# Environment="OLLAMA_FLASH_ATTENTION=1"
# Environment="OLLAMA_NOHISTORY=1"

QIN2DIM commented 1 month ago

Expectation

(base) root@prd-gpu-1-180:/eam/aiops/nodes/dify_plugins/search_toolkit# ollama ps
NAME            ID              SIZE    PROCESSOR       UNTIL            
yi:34b-v1.5     ff94bc7c1b7a    27 GB   100% GPU        3 hours from now
starcoder2:3b   f67ae0f64584    4.1 GB  100% GPU        3 hours from now

Abnormal situation

(base) root@prd-gpu-1-180:/eam/aiops/nodes/dify_plugins/search_toolkit# ollama ps
NAME                    ID              SIZE    PROCESSOR       UNTIL               
codestral:latest        726512da210d    417 GB  100% CPU        29 minutes from now
yi:34b-v1.5             ff94bc7c1b7a    27 GB   100% GPU        3 hours from now

QIN2DIM commented 1 month ago

Interesting, I found that only some models would occur.

(base) root@prd-gpu-1-180:~# ollama ps
NAME                    ID              SIZE    PROCESSOR       UNTIL               
deepseek-coder:6.7b     ce298d984115    102 GB  100% GPU        29 minutes from now
codeqwen:7b             a6f7662764bd    490 GB  100% CPU        22 minutes from now
starcoder2:3b           f67ae0f64584    6.0 GB  100% GPU        29 minutes from now
yi:34b-v1.5             ff94bc7c1b7a    27 GB   100% GPU        3 hours from now

sestinj commented 1 month ago

That's odd. We don't set the num_gpu parameter in our request, but you could do this with requestOptions.extraBodyParameters in config. https://docs.continue.dev/reference/config

These are what we send by default: https://github.com/continuedev/continue/blob/main/core/llm/llms/Ollama.ts#L130-L139

Does anything here stand out as a potential solution?

continuedev / continue