ScottLogic / prompt-injection

Application which investigates defensive measures against prompt injection attacks on an LLM, with a focus on the exposure of external tools.
MIT License
15 stars 10 forks source link

Change behaviour for sending an email when there is an OpenAI error #797

Open pmarsh-scottlogic opened 7 months ago

pmarsh-scottlogic commented 7 months ago

Change

After #743 the behaviour of emails when there is an error from OpenAI is a bit goofy. Specifically: Suppose we ask for an email to be sent. We confirm yes, but gpt fails to give us a reply, when responding to the confirmation. What happens is the email shows up in the UI, but is not included in the chat history. If the email would win the level, then we win the level.

image

What we want instead is that the email doesn't show up in the UI, doesn't get included in the chat history and if the email would win the level, then we do not win the level.

Resultant refactoring

This will allow us to refactor some of the annoying backend code. At the minute, we have a try/catch around openai.chat.completions.create in chatGptChatCompletion openai.ts, which catches the error, logs it to console and returns a ChatGPTReply with the openAIErrorMessage property set accordingly. We can remove this try/catch and put the error handling further up the call chain, and remove properties like this from ChatGPTReply. It'll make the whole thing a lot simpler methinks.

Notes

Note: undoes #467

Note: to simluate OpenAI failing to get a reply when confirming an email, add

if (updatedChatHistory.length === 8) {
    throw new Error("My fake error")
}

to the top of the try statement in chatGptChatCompletion in openai.ts

ACCEPTANCE CRITERIA:

GIVEN on level 1 AND the user has asked to send an email and the bot has replied, asking them to confirm AND the next time we send a message, the OpenAI library throws an error (rate limiting, service down, etc or mocked) WHEN the user confirms that the email should be sent THEN the bot comes back with an error message. Something like "failed to get ChatGPT Reply" AND the email is not shown on the UI AND the email functional calls are not added to the chat history (you can observe the chat history in the console. It is printed every time after a successful message. So send a successful message and look at the console log)

GIVEN on level 1 AND the user has asked to send an email which would win the level (the subject or body contains the word "brae") and the bot has replied, asking them to confirm AND the next time we send a message, the OpenAI library throws an error (rate limiting, service down, etc or mocked) WHEN the user confirms that the email should be sent THEN the bot comes back with an error message. Something like "failed to get ChatGPT Reply" AND the email is not shown on the UI AND the email functional calls are not added to the chat history (you can observe the chat history in the console. It is printed every time after a successful message. So send a successful message and look at the console log) AND the level is not won