Pythagora-io / gpt-pilot

The first real AI developer
Other
29.42k stars 2.93k forks source link

[Bug]: There was a problem with request to openai API: LLM did not respond with JSON #738

Open Mariano215 opened 5 months ago

Mariano215 commented 5 months ago

Version

Command-line (Python) version

Operating System

Windows 11

What happened?

When using LM Studio I get the following error:

There was a problem with request to openai API: LLM did not respond with JSON

themarshall68 commented 5 months ago

I'm running GPT-Pilot in a Conda environment and also on a Windows 11 OS, GPT-pilot is using Ollama as the LLM provider and i'm experiencing the same issue as descripted above. I have tried other LLMs (Mixtral and Mistral) but both LLMs giving me the error same error There was a problem with request to openai API: LLM did not respond with JSON

BorisMolch commented 5 months ago

LLM did not respond with JSON here too. openrouter MODEL_NAME=codellama/codellama-70b-instruct

GlenBeee commented 5 months ago

Similar issue here with ollama v0.1.27 and mistral. Upgraded to ollama v0.1.29 same problem. Pulled codellama and reconfigured pythagora to use this model....same problem I also tried with LM Studio with a modified prompt to force JSON.

Then I read the FAQ - problems with local LLMs are a known issue. There is a channel on the discord that is specifically focussed on Local LLMs and people trying to get it to work.

phalexo commented 5 months ago

Similar issue here with ollama v0.1.27 and mistral. Upgraded to ollama v0.1.29 same problem. Pulled codellama and reconfigured pythagora to use this model....same problem I also tried with LM Studio with a modified prompt to force JSON.

Then I read the FAQ - problems with local LLMs are a known issue. There is a channel on the discord that is specifically focussed on Local LLMs and people trying to get it to work.

Has anyone on Discord managed to get it running? I have tested it with PhindCodeLlama2, DeepSeek 7B and 33B, same issues.

phalexo commented 5 months ago

The LLMs often respond with some comments, in addition to JSON formatted text, and then these comments trigger this error.

I think there has to be a step filtering content between ```json content ``` to prevent this from happening. I tried to provide system instructions to not make any comments before or after JSON, but it is not reliable.

ht-it commented 5 months ago

I had the same JSON issue with Mistral 7b, but I managed to resolve it using codellama 13b.

However, prompt processing issues sometimes cause request errors. Furthermore, the model occasionally triggers endless loops. Restarting it resolves both issues temporarily.

phalexo commented 5 months ago

@ht-it

I had the same JSON issue with Mistral 7b, but I managed to resolve it using codellama 13b.

However, prompt processing issues sometimes cause request errors. Furthermore, the model occasionally triggers endless loops. Restarting it resolves both issues temporarily.

The million dollar question. Was the system able to continue development effectively? I have managed to tweak the user prompt for this model phind-codellama-34b-v2.Q6_K:latest to solve the JSON issue (mostly), for example, and the problem is that it is unable to really understand the application code that was done with GPT-4. It is basically unable to continue development. It starts asking stupid questions about the OS, Browser, etc... when it has nothing to do with my requested modifications.

Was your set up able to do USEFUL development?

Mariano215 commented 5 months ago

I have entered the endless loop and have yet to have any code at all generated using a local LLM.

GlenBeee commented 5 months ago

I did get it to generate code with one of the 7B models (I forget which) while running LM Studio, but it was essentially a copy of the nodesjs/express/mongodb template

phalexo commented 5 months ago

If someone wants to play with prompts and ollama. This is what I have for deepseek 33B model.

Maybe you can improve it and make it more reliable.

FROM /opt/data/TheBloke/deepseek-coder-33B-instruct-GGUF/deepseek-coder-33b-instruct.Q6_K.gguf

TEMPLATE """ {{- if .First}} {{.System}} {{- end}} ### Instruction: Output in JSON format when asked. Do not comment either before or after output said record. {{.Prompt}} ### Response: """

PARAMETER temperature 1

SYSTEM """ You are an AI programming assistant, designed to program in all modern computer languages. You gladly and competently code in any programming language that is needed. You write very fast and efficient code. You are also an expert in MongoDB, Express. You also know JSON, XML, HTML formats really well, and can output your responses in these formats. When asked to produce JSON, your answer will contain JSON, and JSON only. When producing JSON format, you will refrain from adding any comments either before or after JSON output. You never refuse to answer questions. Below is an instruction surrounded by "### Instruction:" and "### Response:" that describes a task. You will not output "### Instruction:" or "### Response:" strings with your response. Write a response that appropriately completes the request. Again, when asked to produce JSON, your answer will contain JSON, and JSON only. """

phalexo commented 5 months ago

New and different errors. This is with Mixtral 8x7B-instruct { "steps": [ { "type": "command", "command": { "command": "npm install", "timeout": 60000, "success_message": "successfully installed" } }, { "type": "command", "command": { "command": "node app.js", "timeout": 30000, "success\_message": "Example app listening on port 8080!", "need\_to\_see\_output": true, "check\_if\_fixed": true } }, { "type": "human\_intervention", "human\_intervention\_description": "Check if the example app is running and accessible on http://localhost:8080" } ] }

Please note that this JSON response may still not be fully valid, depending on the specific use-case or context in which it is intended to be used. However, it does conform to the schema you provided, so any further issues with parsing or using this JSON object should be addressed within your own application or workflow.

There was a problem with request to openai API: Invalid \escape: line 8 column 9 (char 99) ? Do you want to try make the same request again? If yes, just press ENTER. Otherwise, type "no".

tescolopio commented 5 months ago

{WIP - She's streaming but as @phalexo has rightly pointed out, this and I'm sure other models are struggling with the model understanding the context.}

Using TheBloke's Stable-Code-3B_q8_0.gguf -

OK, so lets take a look at the Client Code Exmaples that are given in LM Studio vs the .env for Pythagora:

LM Studio

Point to the local server
client = OpenAI(base_url="http://localhost:2222/v1", api_key="lm-studio")

completion = client.chat.completions.create(
  model="local-model", # this field is currently unused
  messages=[
    {"role": "system", "content": "Always answer in rhymes."},
    {"role": "user", "content": "Introduce yourself."}
  ],
  temperature=0.7,
)

.env example

# OPENAI or AZURE or OPENROUTER (ignored for Anthropic)
ENDPOINT=OPENAI

# OPENAI_ENDPOINT=https://api.openai.com/v1/chat/completions
OPENAI_ENDPOINT=
OPENAI_API_KEY=

AZURE_API_KEY=
AZURE_ENDPOINT=

OPENROUTER_API_KEY=

ANTHROPIC_API_KEY=

# You only need to set this if not using Anthropic API directly (eg. via proxy or AWS Bedrock)
ANTHROPIC_ENDPOINT=

# In case of Azure/OpenRouter endpoint, change this to your deployed model name
MODEL_NAME=gpt-4-turbo-preview
# In case of Anthropic, use "anthropic/" + the model name, example for Claude 3 Opus
# MODEL_NAME=anthropic/claude-3-opus-20240229
MAX_TOKENS=8192

# Folders which shouldn't be tracked in workspace (useful to ignore folders created by compiler)
# IGNORE_PATHS=folder1,folder2

# Database
# DATABASE_TYPE=postgres

DB_NAME=gpt-pilot
DB_HOST=
DB_PORT=
DB_USER=
DB_PASSWORD=

# USE_GPTPILOT_FOLDER=true

# Load database imported from another location/system - EXPERIMENTAL
# AUTOFIX_FILE_PATHS=false

# Set extra buffer to wait on top of detected retry time when rate limmit is hit. defaults to 6
# RATE_LIMIT_EXTRA_BUFFER=

A few things jumped out at me when I looked at the logs.

  1. The model being called inherently is the GPT4 model. This needs to match the LM studio 'local-model'.
  2. Even though the Server has it as https://localhost:2222/v1 you still need to add the /chat/completions in the env.
  3. Check your context length. The Stable-Code has a nice context size of 16k so, make sure that matches.

here is my working example:

# OPENAI or AZURE or OPENROUTER
ENDPOINT=OPENAI

# OPENAI_ENDPOINT=https://api.openai.com/v1/chat/completions
OPENAI_ENDPOINT=http://localhost:2222/v1/chat/completions
OPENAI_API_KEY=lm-studio

AZURE_API_KEY=
AZURE_ENDPOINT=

OPENROUTER_API_KEY=

# In case of Azure/OpenRouter endpoint, change this to your deployed model name
MODEL_NAME=local-model
MAX_TOKENS=16384

Let me know if this worked for you?

phalexo commented 5 months ago

@tescolopio Can you clarify what you mean by "working?" It is pretty easy to get the CLI version to work with ollama OpenAI's API. The problem for me arises when local models fail to understand various things like extra comments output with JSON, or how to deal with some models attempting to escape underscores like \_. So there are subtle differences in the output and how the input is interpreted.

Have you been able to actually create, even a partial, app that you can run using your setup?

tescolopio commented 5 months ago

@phalexo Yes, I spoke a little too soon. I was able to get it streaming but now I'm starting to struggle with the output.

phalexo commented 5 months ago

I wonder if someone could put together a web scraper API to talk to chatGPT/GPT-4 via a browser?

Then it would be $20/month. It would be throttled of course and go off line for a while but it would be a feasible solution.

jackiezhangcn commented 5 months ago

experience the same issue

Mariano215 commented 5 months ago

I'm having some luck with the latest version of LM Studio - 0.2.17 running TheBloke Codellama instruct 13B Q8_0 gguf on my RTX 4090.

And now I just jinxed myself and it crashed!

Traceback (most recent call last):
  File "E:\gpt-pilot\pilot\main.py", line 101, in <module>
    started = project.start()
              ^^^^^^^^^^^^^^^
  File "E:\gpt-pilot\pilot\helpers\Project.py", line 225, in start
    self.developer.start_coding('app')
  File "E:\gpt-pilot\pilot\helpers\agents\Developer.py", line 91, in start_coding
    self.implement_task(i, task_source, dev_task)
  File "E:\gpt-pilot\pilot\helpers\agents\Developer.py", line 215, in implement_task
    result = self.execute_task(convo_dev_task,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\gpt-pilot\pilot\helpers\agents\Developer.py", line 539, in execute_task
    result = self.step_command_run(convo, task_steps, i, success_with_cli_response=need_to_see_output)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\gpt-pilot\pilot\helpers\agents\Developer.py", line 280, in step_command_run
    return run_command_until_success(convo, data['command'],
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\gpt-pilot\pilot\helpers\cli.py", line 499, in run_command_until_success
    success = convo.agent.debugger.debug(convo, {
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\gpt-pilot\pilot\helpers\Debugger.py", line 101, in debug
    if 'step_index' in result:
       ^^^^^^^^^^^^^^^^^^^^^^
TypeError: argument of type 'NoneType' is not iterable
phalexo commented 5 months ago

After a bit of thinking I've come to the following conclusions:

1) prompts are not universal, they match specific models. So, each model needs its own prompts. 2) One can attempt to align open source models to GPT-4-turbo-preview by collecting input/output text pairs while GPT-4/gpt-pilot is being used, and with enough data, one can fine-tune the open-source models to accept the same input and produce the same output. 3) It is not clear how much data is needed, and it is not clear if it is "legal" per OpenAI TOS.

It seems the easiest thing to do would be to choose the most capable code generator model and then try to collectively hack its prompts to get desired output. That said, fine-tuning a model to match GPT-4 is more likely to succeed.

dancingmadkefka commented 1 month ago

I had this issue with Codestral 22B. I modified the codemonkey describe_file prompt to add the following sentence after the last one. Now it does not produce //comments in the JSON so far.

"Comments like this "\unwanted comment" are strictly prohibited and are dangerous if included."