ScottLogic / prompt-injection

Application which investigates defensive measures against prompt injection attacks on an LLM, with a focus on the exposure of external tools.
MIT License
16 stars 11 forks source link

Defence - Filtering #44

Closed heatherlogan-scottlogic closed 1 year ago

heatherlogan-scottlogic commented 1 year ago

Use a blocklist of words and phrases to check user input for potential malicious content

https://learnprompting.org/docs/prompt_hacking/defensive_measures/filtering

Each defence should include the following:

A frontend component for the defensive measure on the left side bar. Checkbox to toggle the defensive measure. Some way to get a description of the defensive measure. Pulsing the defensive measure component when it captures malicious content.

gsproston-scottlogic commented 1 year ago

Removed this from the phase 2 milestone as it's not strictly needed.

heatherlogan-scottlogic commented 1 year ago

thought this would be an interesting one for phase 2 - e.g. specifically not allowing "budget" etc

heatherlogan-scottlogic commented 1 year ago

@gsproston-scottlogic i was thinking of splitting this into two - one for filtering user input and one for bot output - what do you think?

gsproston-scottlogic commented 1 year ago

@gsproston-scottlogic i was thinking of splitting this into two - one for filtering user input and one for bot output - what do you think?

Yeah that makes sense!