Fea: Autogenerate test scenarios in playground and test sets

mmabrouk commented 1 year ago

Is your feature request related to a problem? Please describe. Users don't always have enough test data to test their llm app.

Describe the solution you'd like A button in the playground to auto-generate a new test point. Clicking the button would call gpt3-5 with a prompt that includes the original prompt in the variant, the previous test points, and ask it to provide a new test point. The test point would be added as a new row in the playground.

Technical Details

The execution of the OpenAI call would be as follows:

Invoke gpt-3.5-turbo-0613 (refer to https://platform.openai.com/docs/guides/gpt/function-calling) with a prompt similar to the following:

  messages: [
    {
      role: "system",
      content: "The user is testing multiple data points against a prompt. Please generate a unique data point distinct from the existing ones.",
    },
    {
      role: "user",
      content: `User prompts:\n---\n // In this space, we'll insert the list of user prompts from the playground
    },
    {
      role: "assistant",
      content: null,
      function_call: {
        name: "add_data_point",
        arguments: {input1: "value1", input2: "value2"}, // current data input 1 from the playground
      },
    },
    {
      role: "assistant",
      content: null,
      function_call: {
        name: "add_data_point",
        arguments: {input1: "value1afds", input2: "value2adfs"}, // current data input from box 2 in the playground
      },
    }
.... continuing for the initial n boxes within the playground
  ],
  functions: [
    {
      name: "add_data_point",
      parameters: {
        type: "object",
        properties: {
          input1: { type: "string" },
          input2: { type: "string" },
        },
      },
    },
  ]

Essentially, upon clicking the auto-generate new row button within the playground, we extract the prompts from the variant in the tab in conjunction with the first n inputs already present, and generate a prompt similar to the one described above. This prompt is then sent to OpenAI. The response, for example, add_data_point(input1: somethingtheygenerate, input2:somethingtheygenerate), is parsed, and a new row is generated based on it.

Notes

We would need to use the api keys from the api view. If the user did not provide api keys, we would need to show a modal asking the user to provide their key and sending them to the API key view
We would need to make sure that we at least check to the context size of the call to openai. The models from openai only allow a certain context size (computed in tokens) for both input+output. Depending on how large each data point, we can dynamically put less or more data points in the prompt (e.g. if each datapoint is a transcript for a call, we won't be able to add more than one datapoint, and maybe that alone would be much)
We need some kind of error handling. Sometimes openai will not return a valid json, or sometimes we will have issue with context size... We need to make sure that if it does not work, we inform the user about it.

suadsuljovic commented 1 year ago

@mmabrouk Hello, I had some personal stuff to attend to last two weeks. So I wasn't active much.

I started working on this today. Where do you want the api call to openAI to be on the backend or the frontend? I will do a deep dive into chatGPT api docs. Hopefully I will figure it our until end of the day what I need to use.

mmabrouk commented 1 year ago

@suadsuljovic great to have you back :) I don't see an advantage of doing the calls from the backend. I think we can keep it in the frontend, the keys are saved there anyways for now. We can easily refactor it later to the backend if needed.

mmabrouk commented 1 year ago

Hey @suadsuljovic any updates from your side?

suadsuljovic commented 1 year ago

Hello, sorry I was mostly busy with looking for a new job so I forgot about this.

I will try to finish it until the end of the week.

If I don't just assign this to someone else.

mmabrouk commented 1 year ago

Thanks @suadsuljovic !

Agenta-AI / agenta

Fea: Autogenerate test scenarios in playground and test sets #219

Notes