Prompt-response similarity API endpoint

deadbits / vigil-llm

⚡ Vigil ⚡ Detect prompt injections, jailbreaks, and other potentially risky Large Language Model (LLM) inputs

https://vigil.deadbits.ai/

Apache License 2.0

277 stars 33 forks source link

Prompt-response similarity API endpoint #15

Closed deadbits closed 10 months ago

deadbits commented 10 months ago

Add API endpoint to calculate similarity between a submitted prompt and response

{"prompt": "foo bar", "response": "blah blah"}

Function will compare prompt and response to determine if they are relevant. This doesn't need to be a scanner module imo.. just a call to a func and return

deadbits commented 10 months ago

I have a poc of this working. I'll need to spend some time testing different inputs and finding a good threshold.

This endpoint is more meant to be an additional check right now; it's not part of the scanners (currently/yet?). I did have the idea of input vs output scanners, so maybe this could be the first.

deadbits commented 10 months ago

https://github.com/deadbits/vigil-llm/pull/24