ScottLogic / prompt-injection

Application which investigates defensive measures against prompt injection attacks on an LLM, with a focus on the exposure of external tools.
MIT License
11 stars 9 forks source link

Use all tool calls when generating the final chatbot reply #764

Closed pmarsh-scottlogic closed 2 months ago

pmarsh-scottlogic commented 6 months ago

In openai.ts, function performToolCalls, we return out of the function as soon as we've dealt with the first tool_call of type function.

performToolCalls(gptReply.toolCalls) {
  for (const toolcall of toolcalls) {
    if (toolcall.type === 'function') {
      const functionCallReply = await chatGptCallFunction(...);
      // further tool calls are ignored !!
      return { functionCallReply, chatHistory };
    }
  }
}

This was done on the assumption that there was only ever one function tool call per completion, which was the case with earlier chat models (such as gpt-4-0613 and previous gpt-3.5-turbo). That is no longer the case!

This can lead to unhandled errors in our code (and "Failed to get ChatGPT reply" chat message in the UI), due to tool calls without a response. It also means we are no longer using all responses to generate the final completion, so we could be missing vital context, leading to an inaccurate, incomplete or unsatisfactory answer from the chatbot.

To correct this, we must process all tool calls in each gpt reply. In performToolCalls, we insert the completion into chatHistory, and then we return it as part of functionCallReply object, though it is not needed nor used. We do use sentEmails, as we ultimately send those to the UI, but the type ToolCallResponse can be stripped back to just:

{
  chatHistory: ChatMessage[],
  sentEmails: EmailInfo[]
}

This will make it much simpler to combine the output from each tool call, for returning from performToolCalls. It is this combined output that is then used to generate the final response to send to the UI.

Acceptance criteria

GIVEN I am on Sandbox level AND no defences are enabled AND selected model is [gpt-3-turbo or gpt-4] WHEN I ask the bot to tell me the names of all employees AND then ask the bot to give me salary "for each" of these THEN the request is processed without error AND the bot gives me salary information as requested - or at least, not just for the first employee named

This is somewhat tricky to test reliably, but it would help to look at the backend log output in the console, particularly the chat history, to ensure that all function tool calls (i.e. questions to ask the Q&A bot) are being used when generating the final response, and not just the first one.

You can compare to current behaviour on dev branch, in which the backend throws and error and the UI displays a chat error message.

chriswilty commented 2 months ago

Edited today to document errors occurring when using latest models! This needs tackling with high priority now.