ScottLogic / prompt-injection

Application which investigates defensive measures against prompt injection attacks on an LLM, with a focus on the exposure of external tools.
MIT License
16 stars 10 forks source link

Move logic for detecting output defence (bot filtering) #708

Open pmarsh-scottlogic opened 10 months ago

pmarsh-scottlogic commented 10 months ago

Right now we handle the input defence detection in handleHigherLevelChat(), but the output defence detection in chatGptSendMessage(), which strikes me as out of place. I'd like to move the output defence detection up to handleHigherLevelChat(). Which will make chatGptSendMessage() responsible for one thing fewer 👍

Also move the logic to detect output defences into defence.ts

while we're here, can we rename detectTriggeredDefences() to detectTriggeredInputDefences()

AC

Refactor ticket, so just regression testing here.

pmarsh-scottlogic commented 10 months ago

This is likely to conflict with #705 if done at the same time

pmarsh-scottlogic commented 9 months ago

Same comments apply here as in the testing comment for 705.