deadbits / vigil-llm

⚡ Vigil ⚡ Detect prompt injections, jailbreaks, and other potentially risky Large Language Model (LLM) inputs
https://vigil.deadbits.ai/
Apache License 2.0
277 stars 33 forks source link

Update database with detected injections #10

Closed deadbits closed 10 months ago

deadbits commented 10 months ago

When vigil scans a prompt and detects potential prompt injection, that prompt should be added back to chromadb so prompts submitted in the future can be scanned against it.

Since any single scanner isn't enough to confident in a malicious result, maybe prompts are only added back to the DB if all scanners flag it? Or some percentage of the scanners that ran.. like 3/4 flagged the prompt so we add it to the db.

Should prompts added back to the db be put in another collection? Separate the "learned attacks" from the original data? Original data has higher confidence while the learned attacks are likely more prone to false positives. Either way there should be some way to flag that a match came from a past learned attack

deadbits commented 10 months ago

Done! https://github.com/deadbits/vigil-llm/pull/31