Closed nav9 closed 8 months ago
Hi,
Code completion will normally abort requests if a request is running and there is a user action such as typing. From the machine specs you provided, it could take a while to get an AI response due to the graphics card on the machine - so it appears that requests are inflight but are being cancelled because you are typing, this is normal. We currently have a short delay after you finish typing before the request is made.
Using an explicit hotkey is a good idea for an enhancement, thanks!
Are you able to verify if you trigger code completion (check the bottom right to see if the icon is spinning) and just wait without touching anything in the file or typing that it succeeds?
Yes, I noticed the time difference when using Twinny. Since I'm on a CPU-only system, the larger models take time to respond. But the smaller models respond fast. This is the response time when using Twinny.
deepseek coder
takes 18 seconds to generate a code completion.codellama
takes 36 seconds.stable-code
takes just 2 or 3 seconds.So I switched to stable-code
a few days ago, and Wingman-AI displayed an error message that it's not supported. I uninstalled Wingman-AI.
Today I reinstalled it, and it left me hanging (nothing loads. only the blue loading icon keeps moving horizontally):
I checked the output, and it shows the codeModel
is stable-code
:
But when I look at the settings, it's not stable-code
. So I'm unable to switch back to deepseek coder
.
I know I could solve it by clearing some cache, but that's not what matters here. The design of the extension needs to safeguard from such situations.
Thanks for the suggestions! I will log an issue related to the extension crashing with invalid configuration/non supported models.
and a second enhancement to switch code complete to some form of hotkey instead of a simple on or off
Covering in #27 and #28
Since we added issues to add the suggestions I'm closing this ticket. For the hotkey I'll be releasing that soon so keep an eye out for that.
I'm using a computer with AMD Ryzen 5 5600G with integrated Radeon Graphics × 6, and 32GB RAM, on Linux Mint 21.
This is the code I'm using to try Wingman-AI:
At the
isPrime
function, I've been trying to type eitherif
orfor
to see if Wingman-AI generates a code completion.From what I understand, while typing code, if I pause, Wingman-AI is supposed to suggest a code completion. In my case, it isn't working. Here's the output:
I noticed
"num_predict":-1
and checkedWingman config
. Sure enough,Code max tokens
is-1
by default. Made the value100
, closed vscode, opened it again and tried. This time:So what would I need to do to get code completions?
Few other humble suggestions:
Conservative use of CPU required:
To save on power, Users could prefer to have Wingman-AI query Ollama for a code completion only when using a key combination like perhaps
Ctrl+i
or something. If Wingman-AI is going to try generating completions everytime the User pauses, it consumes a huge amount of CPU power even when the User doesn't want it to. This not only adds up in terms of the electricity bill, it also puts the User's CPU fan under constant duress/wear. I understand that some people would prefer not having to press a key combo, so this could be a setting that Users could choose. Either to use a pause or a key combo.Readme update required:
It'd help to update the readme to show users how they can view the output logs and show a screenshot of the wingman config tab, to show how easy it is to change settings. Also, most users won't know what the consequence of changing settings like the code context window, code max tokens, chat context window etc is. So it'd be nice to explain those, and also explain why there's a separate code model and a separate chat model.