Open sirus20x6 opened 5 days ago
When reporting problems, it is very helpful if you can provide:
Including the “announcement” lines that aider prints at startup is an easy way to share some of this helpful info.
Aider v0.37.1-dev
Models: gpt-4o with diff edit format, weak model gpt-3.5-turbo
Git repo: .git with 243 files
Repo-map: using 1024 tokens
I'm seeing the same thing starting aider in an empty git project:
$ aider --model ollama/qwen2.5-coder:32b --model-metadata-file .aider.model.metadata.json
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Warning for ollama/qwen2.5-coder:32b: Unknown context window size and costs, using sane defaults.
Did you mean one of these?
- ollama/qwen2.5-coder:32b
You can skip this check with --no-show-model-warnings
https://aider.chat/docs/llms/warnings.html
Open documentation url for more info? (Y)es/(N)o [Yes]: N
Aider v0.62.1
Model: ollama/qwen2.5-coder:32b with whole edit format
Git repo: .git with 0 files
Repo-map: disabled
Use /help <question> for help, run "aider --help" to see cmd line args
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
>
Platform: MacOS Metadata file:
{
"ollama/qwen2.5-coder:32b": {
"max_tokens": 4096,
"max_input_tokens": 128000,
"max_output_tokens": 128000,
"input_cost_per_token": 0.000000000014,
"output_cost_per_token": 0.000000000028,
"litellm_provider": "ollama",
"mode": "chat"
}
}
Ok, aider was swallowing an Ollama exception, which I now fixed and allow it to explode. You need to have your Ollama server running and your api base set.
Exception: OllamaError: Error getting model info for ollama/qwen2.5-coder:32b. Set Ollama API Base via `OLLAMA_API_BASE` environment variable. Error: [Errno 61] Connection refused
The change is available in the main branch. You can get it by installing the latest version from github:
aider --install-main-branch
# or...
python -m pip install --upgrade --upgrade-strategy only-if-needed git+https://github.com/Aider-AI/aider.git
If you have a chance to try it, let me know if it works better for you.
Looks like this may be an underlying litellm bug.
I did some man in the middling
POST /api/show HTTP/1.1
Host: 127.0.0.1:11434
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
User-Agent: litellm/1.51.2
Content-Length: 28
Content-Type: application/json
{"name": "ollama/qc:latest"}HTTP/1.1 404 Not Found
Content-Type: application/json; charset=utf-8
Date: Tue, 12 Nov 2024 04:38:02 GMT
Content-Length: 46
{"error":"model 'ollama/qc:latest' not found"}
"ollama/" isn't being stripped off the model in the request
I edited def get_model_info to pass in my model name and built from source and pip installed the wheel but no change. the problem must be in return litellm.get_model_info(model)
I found temporary solution - you can create modelfile and then ollama create ollama/qwen2.5-coder:32b
-f Modelfile
The content of modelfile:
FROM qwen2.5-coder:32b
After these manipulations I was able to run aider with qwen2.5-coder:32b (I made a custom one with own system prompt but anyway you can do the same with renaming original model)
I updated litellm_provider and aider to latest versions litellm==1.52.8. the fix he pushed didn't seem to make a difference. the model file work around did, but then ollama crashed on me. it seems to be trying to load too many layers to my gpu when I have a server pretty much made for cpu inference and I really only want one gpu layer for the prompt processing speedup. I gave up on trying to find where to change that setting on ollama, and I decided to try switching to llama-cpp-python because llama.cpp has always worked the best for me out of anything. welll ...
aider
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Warning for llama-cpp-python/qwen2.5-coder:32b-instruct-q8_0: Unknown context window size and costs, using sane defaults.
Did you mean one of these?
- llama-cpp-python/qwen2.5-coder:32b-instruct-q8_0
You can skip this check with --no-show-model-warnings
https://aider.chat/docs/llms/warnings.html
Open documentation url for more info? (Y)es/(N)o/(D)on't ask again [Yes]:
Litellm just released a fixed version. The main branch of aider uses it now.
The change is available in the main branch. You can get it by installing the latest version from github:
aider --install-main-branch
# or...
python -m pip install --upgrade --upgrade-strategy only-if-needed git+https://github.com/Aider-AI/aider.git
If you have a chance to try it, let me know if it works better for you.
Sorry the above install approaches apparently won't bump the Litellm version.
Try:
pip install -U litellm
I ended up just hacking together a llama.cpp backend
.env
.aider.model.metadata.json
It seems to me that those model names match so it should be picking up the settings?