ScottLogic / prompt-injection

Application which investigates defensive measures against prompt injection attacks on an LLM, with a focus on the exposure of external tools.
MIT License
11 stars 9 forks source link

Review `ChatHttpResponse` #880

Open pmarsh-scottlogic opened 3 months ago

pmarsh-scottlogic commented 3 months ago

When we access the API, by POST /openai/chat, we return an object that looks like this

interface ChatHttpResponse {
    reply: string;
    defenceReport: DefenceReport;
    transformedMessage?: TransformedChatMessage;
    wonLevel: boolean;
    isError: boolean;
    openAIErrorMessage: string | null;
    sentEmails: EmailInfo[];
    transformedMessageInfo?: string;
}

After #873 we add wonLevelMessage as well.

What's the problem? Well all of the following properties represent something that might be added to the chat history: reply, transformedMessage, wonLevelMessage, openAIErrorMessage, transformedMessageInfo , or at least displayed on the frontend's chatHistory. We should think about perhaps just returning a list of ChatMessages

This will require some investigation (about when, where and how the above messages get added to the font/backend chats), and a bit of poking around on both the front and back end, and then probably lots of test changes.

pmarsh-scottlogic commented 3 months ago

Here's all the cases:

normal chat

image

chat with transformation

image

win level

image

defence trigger

image

defence alert

image

openAI error

paste the following code in the try block in chatGptChatCompletion

throw new Error(
    '429: You are being rate limited. Please try again in 3 minutes.'
);

image

other error

paste throw new Error('Test error'); into the top of the try block in handleChatToGPT image

defence alert AND win level

transformation AND win level

transformation AND defence alert

transformation AND defence trigger

transformation AND defence alert AND win level

multiple defence triggers

defence trigger and alert

multiple defence alerts