interpreter executes terminal command like its python code and not in the terminal -lmstudio local llms

OpenInterpreter / open-interpreter

A natural language interface for computers

http://openinterpreter.com/

GNU Affero General Public License v3.0

54.89k stars 4.79k forks source link

interpreter executes terminal command like its python code and not in the terminal -lmstudio local llms #979

Closed mansyno closed 7 months ago

mansyno commented 9 months ago

Describe the bug

when i ask it to perform tasks on the terminal like creating a new subfolder. it generates the correct code: md foldername but it runs it as if it is a python code instead of running it in the terminal. i tested it with few LLMs i downloaded in LMstudio. i dont have a openai APIKEY so i cant test it there. i don't know if the issue is with the local models being to small(or stupid) or something with my prompt or with code interpreter itself the prompt i provided is: create a subfolder in the current directory

Reproduce

ask it to do an action on your local folder like creating a subfolder: create a subfolder in the current directory

Expected behavior

generate the command md folderName execute it in the terminal

Screenshots

No response

Open Interpreter version

0.2.0

Python version

3.11.4

Operating System name and version

Windows 10

Additional context

No response

melMass commented 9 months ago

The same happens for JS, I guess it expects all code to be python, there also seems to be a leak between the system prompts and the instructions

Notnaton commented 9 months ago

What model are you using?

mansyno commented 9 months ago

Hi @Notnaton i used the model from here: https://huggingface.co/beowolx/MistralHermes-CodePro-7B-v1-GGUF and few other esoteric ones but they didn't work, not even when i tried different prompts. i then was able to gain access to openai gpt-4 model and everything worked later i found another local model :https://huggingface.co/TheBloke/basilisk-7B-v0.2-GGUF and to my surprise it was indeed able to distinguish between running shell commands and python of course it wasn't as good and accurate as gpt4 but it was enough for what i tried (an app that gets a YouTube url and downloads the video) it seems it is indeed a matter of which model is used, but im not an exeprt to be able to say what exactly is the issue (or if it is even worth pursuing) :)

oangelo commented 9 months ago

I have experimented open interpreter with the OpenAI API, and it has performed exceptionally well. However, when I attempted to use lm-studio with various other models, the results were disappointingly poor. Initially, I believed that upgrading to a 32GB runtime and utilizing 34 billion parameter models would improve the situation, but regrettably, it made no noticeable difference. I believe it would be beneficial to receive recommendations on which model to use on the docs, as it appears that lm-studio does not function effectively with some well-known models such as Mixtral, Llama, and Codewizard.

RPHllc commented 9 months ago

I have similar experience, it works with GPT-4. I have tried using text-generation-webui with the openai extension and the model TheBloke_Mistral-7B-Instruct-v0.2-AWQ, the interface works and I am able to make queries. But when I try commands that need to be executed in my computer, for example list files in current directory, the execution fails. I thought we had a bug, but you seem to indicate that works with certain models. It would be nice if someone could explain what the issue is, do we need specific models or to design a better prompt?

melMass commented 9 months ago

I think for some reason the system prompt has more "weight" in local llms and the part about. (I can't find it right away but it insists on very web oriented jargon/frameworks for half the system prompt)

That said I had better results using the following models:

OpenHermes 2.5 mistral 16k (the non 16k too)
Doplhin phi2 (strangely performant given it's only 2GB)

The ones that are heavily influenced by what I just mentionned (and so not working well):

MixtralX
Prometheus v1.0

MikeBirdTech commented 7 months ago

Smaller models will be more limited than larger models with the vast majority of tasks. I have had more success with OpenCodeInterpreter and Nous Hermes 2 Pro. They still are not as performant as GPT-4 but with additional custom_instructions, they are able to perform more tasks than most open models. There are discussions on the Discord server about creating a resource for comparing models, prompts, and performance. I invite you to join the conversation so the community can create a resource that helps everyone use OI effectively!

Invite link: https://discord.gg/Hvz9Axh84z