When vigil scans a prompt and detects potential prompt injection, that prompt should be added back to chromadb so prompts submitted in the future can be scanned against it.
Since any single scanner isn't enough to confident in a malicious result, maybe prompts are only added back to the DB if all scanners flag it? Or some percentage of the scanners that ran.. like 3/4 flagged the prompt so we add it to the db.
Should prompts added back to the db be put in another collection? Separate the "learned attacks" from the original data? Original data has higher confidence while the learned attacks are likely more prone to false positives. Either way there should be some way to flag that a match came from a past learned attack
When vigil scans a prompt and detects potential prompt injection, that prompt should be added back to chromadb so prompts submitted in the future can be scanned against it.
Since any single scanner isn't enough to confident in a malicious result, maybe prompts are only added back to the DB if all scanners flag it? Or some percentage of the scanners that ran.. like 3/4 flagged the prompt so we add it to the db.
Should prompts added back to the db be put in another collection? Separate the "learned attacks" from the original data? Original data has higher confidence while the learned attacks are likely more prone to false positives. Either way there should be some way to flag that a match came from a past learned attack