litellm bug is causing "unknown model" warnings in aider for Ollama models

sirus20x6 commented 5 days ago

Warning for ollama/vanilj/supernova-medius:q6_k_l: Unknown context window size and costs, using sane defaults.
Did you mean one of these?
- ollama/vanilj/supernova-medius:q6_k_l
You can skip this check with --no-show-model-warnings

.env

OLLAMA_API_BASE=http://127.0.0.1:11434
AIDER_DARK_MODE=true
AIDER_MODEL="ollama/vanilj/supernova-medius:q6_k_l"
AIDER_SHOW_DIFFS=true
AIDER_GIT=true
AIDER_GITIGNORE=true
AIDER_AIDERIGNORE=.aiderignore
AIDER_TEST_CMD="cd build && ./build.sh"
AIDER_AUTO_TEST=true
AIDER_TEST=true

.aider.model.metadata.json

{
    "ollama/vanilj/supernova-medius:q6_k_l": {
        "max_tokens": 131072,
        "max_input_tokens": 131072,
        "max_output_tokens": 4096,
        "input_cost_per_token": 0.0,
        "output_cost_per_token": 0.0,
        "litellm_provider": "ollama",
        "mode": "chat"
    }
}

It seems to me that those model names match so it should be picking up the settings?

paul-gauthier commented 3 days ago

When reporting problems, it is very helpful if you can provide:

Aider version
LLM model you are using
Any stack traces or error messages
A description of what you were doing when the error happened.

Including the “announcement” lines that aider prints at startup is an easy way to share some of this helpful info.

Aider v0.37.1-dev
Models: gpt-4o with diff edit format, weak model gpt-3.5-turbo
Git repo: .git with 243 files
Repo-map: using 1024 tokens

exclamaforte commented 3 days ago

I'm seeing the same thing starting aider in an empty git project:

$ aider --model ollama/qwen2.5-coder:32b --model-metadata-file .aider.model.metadata.json
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Warning for ollama/qwen2.5-coder:32b: Unknown context window size and costs, using sane defaults.
Did you mean one of these?
- ollama/qwen2.5-coder:32b
You can skip this check with --no-show-model-warnings

https://aider.chat/docs/llms/warnings.html
Open documentation url for more info? (Y)es/(N)o [Yes]: N

Aider v0.62.1
Model: ollama/qwen2.5-coder:32b with whole edit format
Git repo: .git with 0 files
Repo-map: disabled
Use /help <question> for help, run "aider --help" to see cmd line args
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
>

Platform: MacOS Metadata file:

{
    "ollama/qwen2.5-coder:32b": {
        "max_tokens": 4096,
        "max_input_tokens": 128000,
        "max_output_tokens": 128000,
        "input_cost_per_token": 0.000000000014,
        "output_cost_per_token": 0.000000000028,
        "litellm_provider": "ollama",
        "mode": "chat"
    }
}

paul-gauthier commented 3 days ago

Ok, aider was swallowing an Ollama exception, which I now fixed and allow it to explode. You need to have your Ollama server running and your api base set.

Exception: OllamaError: Error getting model info for ollama/qwen2.5-coder:32b. Set Ollama API Base via `OLLAMA_API_BASE` environment variable. Error: [Errno 61] Connection refused

The change is available in the main branch. You can get it by installing the latest version from github:

aider --install-main-branch

# or...

python -m pip install --upgrade --upgrade-strategy only-if-needed git+https://github.com/Aider-AI/aider.git

If you have a chance to try it, let me know if it works better for you.

paul-gauthier commented 3 days ago

Looks like this may be an underlying litellm bug.

https://github.com/BerriAI/litellm/issues/6703

sirus20x6 commented 3 days ago

I did some man in the middling

POST /api/show HTTP/1.1
Host: 127.0.0.1:11434
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
User-Agent: litellm/1.51.2
Content-Length: 28
Content-Type: application/json

{"name": "ollama/qc:latest"}HTTP/1.1 404 Not Found
Content-Type: application/json; charset=utf-8
Date: Tue, 12 Nov 2024 04:38:02 GMT
Content-Length: 46

{"error":"model 'ollama/qc:latest' not found"}

"ollama/" isn't being stripped off the model in the request

I edited def get_model_info to pass in my model name and built from source and pip installed the wheel but no change. the problem must be in return litellm.get_model_info(model)

andrew528i commented 1 day ago

I found temporary solution - you can create modelfile and then ollama create ollama/qwen2.5-coder:32b -f Modelfile

The content of modelfile:

FROM qwen2.5-coder:32b

After these manipulations I was able to run aider with qwen2.5-coder:32b (I made a custom one with own system prompt but anyway you can do the same with renaming original model)

sirus20x6 commented 9 hours ago

I updated litellm_provider and aider to latest versions litellm==1.52.8. the fix he pushed didn't seem to make a difference. the model file work around did, but then ollama crashed on me. it seems to be trying to load too many layers to my gpu when I have a server pretty much made for cpu inference and I really only want one gpu layer for the prompt processing speedup. I gave up on trying to find where to change that setting on ollama, and I decided to try switching to llama-cpp-python because llama.cpp has always worked the best for me out of anything. welll ...

aider
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Warning for llama-cpp-python/qwen2.5-coder:32b-instruct-q8_0: Unknown context window size and costs, using sane defaults.
Did you mean one of these?
- llama-cpp-python/qwen2.5-coder:32b-instruct-q8_0
You can skip this check with --no-show-model-warnings

https://aider.chat/docs/llms/warnings.html
Open documentation url for more info? (Y)es/(N)o/(D)on't ask again [Yes]:

paul-gauthier commented 7 hours ago

Litellm just released a fixed version. The main branch of aider uses it now.

The change is available in the main branch. You can get it by installing the latest version from github:

aider --install-main-branch

# or...

python -m pip install --upgrade --upgrade-strategy only-if-needed git+https://github.com/Aider-AI/aider.git

If you have a chance to try it, let me know if it works better for you.

paul-gauthier commented 6 hours ago

Sorry the above install approaches apparently won't bump the Litellm version.

Try:

pip install -U litellm

sirus20x6 commented 6 hours ago

I ended up just hacking together a llama.cpp backend

Aider-AI / aider

litellm bug is causing "unknown model" warnings in aider for Ollama models #2318