nav9 commented 8 months ago

I'm using a computer with AMD Ryzen 5 5600G with integrated Radeon Graphics × 6, and 32GB RAM, on Linux Mint 21.
This is the code I'm using to try Wingman-AI:

def isPrime(n):
    """ Test if n is a prime number """
    if 

def hello(i):
    for j in range(i):
        print(f"{j} is a value")

At the isPrime function, I've been trying to type either if or for to see if Wingman-AI generates a code completion.

From what I understand, while typing code, if I pause, Wingman-AI is supposed to suggest a code completion. In my case, it isn't working. Here's the output:

27/02/2024, 22:01:01 - [info] Ollama - Code Completion submitting request with body: {"model":"deepseek-coder:6.7b-base-q8_0","prompt":"<｜fim▁begin｜>\ndef isPrime(n):\n    \"\"\" Test if n is a prime number \"\"\"\n    for i <｜fim▁hole｜>\n\n        \n\n\ndef hello(i):\n    for j in range(i):\n        print(f\"{j} is a value\")        \n\n<｜fim▁end｜>","stream":false,"raw":true,"options":{"temperature":0.6,"num_predict":-1,"top_k":30,"top_p":0.2,"repeat_penalty":1.1,"stop":["<｜end▁of▁sentence｜>","<｜EOT｜>","\\n","</s>"]}}
27/02/2024, 22:01:01 - [error] Ollama - code completion request with model deepseek-coder:6.7b-base-q8_0 failed with the following error: AbortError: This operation was aborted
27/02/2024, 22:01:01 - [info] Ollama - Code Completion execution time: 0.298 seconds
27/02/2024, 22:01:02 - [info] Ollama - Code Completion submitting request with body: {"model":"deepseek-coder:6.7b-base-q8_0","prompt":"<｜fim▁begin｜>\ndef isPrime(n):\n    \"\"\" Test if n is a prime number \"\"\"\n    for i in<｜fim▁hole｜>\n\n        \n\n\ndef hello(i):\n    for j in range(i):\n        print(f\"{j} is a value\")        \n\n<｜fim▁end｜>","stream":false,"raw":true,"options":{"temperature":0.6,"num_predict":-1,"top_k":30,"top_p":0.2,"repeat_penalty":1.1,"stop":["<｜end▁of▁sentence｜>","<｜EOT｜>","\\n","</s>"]}}
27/02/2024, 22:01:02 - [error] Ollama - code completion request with model deepseek-coder:6.7b-base-q8_0 failed with the following error: AbortError: This operation was aborted
27/02/2024, 22:01:02 - [info] Ollama - Code Completion execution time: 0.143 seconds
27/02/2024, 22:01:02 - [info] Ollama - Code Completion submitting request with body: {"model":"deepseek-coder:6.7b-base-q8_0","prompt":"<｜fim▁begin｜>\ndef isPrime(n):\n    \"\"\" Test if n is a prime number \"\"\"\n    for i in <｜fim▁hole｜>\n\n        \n\n\ndef hello(i):\n    for j in range(i):\n        print(f\"{j} is a value\")        \n\n<｜fim▁end｜>","stream":false,"raw":true,"options":{"temperature":0.6,"num_predict":-1,"top_k":30,"top_p":0.2,"repeat_penalty":1.1,"stop":["<｜end▁of▁sentence｜>","<｜EOT｜>","\\n","</s>"]}}
27/02/2024, 22:01:21 - [error] Ollama - code completion request with model deepseek-coder:6.7b-base-q8_0 failed with the following error: AbortError: This operation was aborted
27/02/2024, 22:01:21 - [info] Ollama - Code Completion execution time: 18.563 seconds

I noticed "num_predict":-1 and checked Wingman config. Sure enough, Code max tokens is -1 by default. Made the value 100, closed vscode, opened it again and tried. This time:

27/02/2024, 22:16:51 - [info] Ollama - Code Completion submitting request with body: {"model":"deepseek-coder:6.7b-base-q8_0","prompt":"<｜fim▁begin｜>\ndef isPrime(n):\n    \"\"\" Test if n is a prime number \"\"\"\n    <｜fim▁hole｜>\n        \n\ndef hello(i):\n    for j in range(i):\n        print(f\"{j} is a value\")        \n\n<｜fim▁end｜>","stream":false,"raw":true,"options":{"temperature":0.6,"num_predict":100,"top_k":30,"top_p":0.2,"repeat_penalty":1.1,"stop":["<｜end▁of▁sentence｜>","<｜EOT｜>","\\n","</s>"]}}
27/02/2024, 22:16:51 - [error] Ollama - code completion request with model deepseek-coder:6.7b-base-q8_0 failed with the following error: AbortError: This operation was aborted
27/02/2024, 22:16:51 - [info] Ollama - Code Completion execution time: 0.112 seconds
27/02/2024, 22:16:52 - [info] Ollama - Code Completion submitting request with body: {"model":"deepseek-coder:6.7b-base-q8_0","prompt":"<｜fim▁begin｜>\ndef isPrime(n):\n    \"\"\" Test if n is a prime number \"\"\"\n    if <｜fim▁hole｜>\n        \n\ndef hello(i):\n    for j in range(i):\n        print(f\"{j} is a value\")        \n\n<｜fim▁end｜>","stream":false,"raw":true,"options":{"temperature":0.6,"num_predict":100,"top_k":30,"top_p":0.2,"repeat_penalty":1.1,"stop":["<｜end▁of▁sentence｜>","<｜EOT｜>","\\n","</s>"]}}
27/02/2024, 22:17:30 - [error] Ollama - code completion request with model deepseek-coder:6.7b-base-q8_0 failed with the following error: AbortError: This operation was aborted
27/02/2024, 22:17:30 - [info] Ollama - Code Completion execution time: 37.988 seconds

So what would I need to do to get code completions?

Few other humble suggestions:

Conservative use of CPU required:

To save on power, Users could prefer to have Wingman-AI query Ollama for a code completion only when using a key combination like perhaps Ctrl+i or something. If Wingman-AI is going to try generating completions everytime the User pauses, it consumes a huge amount of CPU power even when the User doesn't want it to. This not only adds up in terms of the electricity bill, it also puts the User's CPU fan under constant duress/wear. I understand that some people would prefer not having to press a key combo, so this could be a setting that Users could choose. Either to use a pause or a key combo.

Readme update required:

It'd help to update the readme to show users how they can view the output logs and show a screenshot of the wingman config tab, to show how easy it is to change settings. Also, most users won't know what the consequence of changing settings like the code context window, code max tokens, chat context window etc is. So it'd be nice to explain those, and also explain why there's a separate code model and a separate chat model.
wingmanConfig

RussellCanfield commented 8 months ago

Hi,

Code completion will normally abort requests if a request is running and there is a user action such as typing. From the machine specs you provided, it could take a while to get an AI response due to the graphics card on the machine - so it appears that requests are inflight but are being cancelled because you are typing, this is normal. We currently have a short delay after you finish typing before the request is made.

Using an explicit hotkey is a good idea for an enhancement, thanks!

Are you able to verify if you trigger code completion (check the bottom right to see if the icon is spinning) and just wait without touching anything in the file or typing that it succeeds?

nav9 commented 8 months ago

Yes, I noticed the time difference when using Twinny. Since I'm on a CPU-only system, the larger models take time to respond. But the smaller models respond fast. This is the response time when using Twinny.

deepseek coder takes 18 seconds to generate a code completion.
codellama takes 36 seconds.
stable-code takes just 2 or 3 seconds.

So I switched to stable-code a few days ago, and Wingman-AI displayed an error message that it's not supported. I uninstalled Wingman-AI.

Today I reinstalled it, and it left me hanging (nothing loads. only the blue loading icon keeps moving horizontally):
hanging

I checked the output, and it shows the codeModel is stable-code:
errorWingman

But when I look at the settings, it's not stable-code. So I'm unable to switch back to deepseek coder.
wingmanSettings

I know I could solve it by clearing some cache, but that's not what matters here. The design of the extension needs to safeguard from such situations.

Few suggestions:

Please add support for smaller models.
You could have a look at Twinny's interface design. It makes it easy to switch models, and is reasonably robust. Even the settings are more accessible. There are some things that Twinny can learn from Wingman-AI too.

RussellCanfield commented 8 months ago

Thanks for the suggestions! I will log an issue related to the extension crashing with invalid configuration/non supported models.

and a second enhancement to switch code complete to some form of hotkey instead of a simple on or off

RussellCanfield commented 8 months ago

Covering in #27 and #28

harlenalvarez commented 8 months ago

Since we added issues to add the suggestions I'm closing this ticket. For the hotkey I'll be releasing that soon so keep an eye out for that.

RussellCanfield / wingman-ai

Auto completion not working and needs some improvements #25

Conservative use of CPU required:

Readme update required:

Few suggestions: