Cormanz / smartgpt

A program that provides LLMs with the ability to complete complex tasks using plugins.
MIT License
1.76k stars 127 forks source link

GPT3.5 Resilience & JSON Prompting Techniques #15

Open Cormanz opened 1 year ago

Cormanz commented 1 year ago

When doing testing, gpt-3.5-turbo tends to struggle a lot with formatting JSON queries. It sometimes responds with plaintext and then gives the JSON query, or messes up the format.

This issue proposes two suggested changes.

  1. Search for any valid JSON match inside of the response. If one is found, use it (modify the message to remove anything else except the JSON response)
  2. Try to get an LLM to fix the JSON response if it's invalid.

Alongside this, it also might be a good idea to use YML for prompt token reduction and for making it easier for LLMs to understand the prompt. However, whether this is better than JSON is uncertain (it hasn't proven reliable in my light testing.)

calum-bird commented 1 year ago

If you use a gpt-3 base model like davinci-003, you can force it to use JSON formatting by providing it the start of a json object at the end of the prompt. Ie:

Some instructions... always return in json.

Here is the response:
{

This doesn't work for chat models like 3.5 & 4 since you can't seem to provide the start of an assistant response as before, but it's possible that an older model that is better with formatting will actually be an improvement.

Cormanz commented 1 year ago

I've definitely been looking into those ideas, alongside even more sophisticated solutions like jsonformer. The main goal is providing support for instruct/chat based models however (the internals of SmartGPT take advantage of Vec<Message> or &[Message] which are abstractions around chat conversations), and although this can be used with base models too by turning it into a pure plaintext prompt, I'm not sure it'd be as effective.

You might be able to provide the start of an assistant response: in the OpenAI playground, you're able to add an assistant response, and then it'll continue. I haven't tested this with the API, but there's no reason why it would work differently.

I think even if we had it start the response with {, unless we used something like jsonformer, there would still be parsing errors (text outside of the JSON, trailing commas, etc.) I think #20 is a strong start at being able to more effectively use this with models that are worse at precise formatting like gpt-3.5-turbo, and future prompting techniques should be investigated in the future. I'll keep this issue open as a place for discussion of future JSON prompting techniques.

jaykchen commented 1 year ago

@Cormanz I have a strong opinion:

please go to https://platform.openai.com/tokenizer to find out how wasteful it is to use json format in gpt models, see the colorized blocks for tokens wasted on json's formating image

jaykchen commented 1 year ago

on the other side of the coin, it saves token and I guess it's easier for llm to understand if we feed llm data in json if the data has a tree like structure, because json is already a structured data representation, following screencap indicates a plain text representation of a tree structure is very wasteful in token count image

Cormanz commented 1 year ago

@Cormanz I have a strong opinion:

  • let llm output in json format wastes lots of computing power and token count
  • I propose we ask llm to return thoughts, etc. in separate text sections, we write old fashioned code manually parse them into Rust struct

please go to https://platform.openai.com/tokenizer to find out how wasteful it is to use json format in gpt models, see the colorized blocks for tokens wasted on json's formating image

The output you showed uses multiple levels of indentation, which greatly increases the token account. The SmartGPT prompt only uses one level of indentation, so the token cost of indentation is reduced significantly, making it much less costly compared to your example.

In addition, JSON is a format that the model was trained on extensively. When it is asked to write JSON, it has a significant amount of internal knowledge of it, and can do it much more effectively, and more consistently. On the other hand, asking it to write it in separate sections would completely forgo its understanding of JSON, and lead to much less consistent results, alongside making us maintain even more code to handle proper parsing of it.

I don't think that it's a solution worth considering, given the minimal token cost with one level of indentation, and the extreme lack of consistency. Rather, it may be worth pursuing technologies like JSON-Former, or using YML to reduce tokens and allow for responses to be streamable and more consistent.

jaykchen commented 1 year ago

@Cormanz thanks for your explanation