deadbits / vigil-llm

⚡ Vigil ⚡ Detect prompt injections, jailbreaks, and other potentially risky Large Language Model (LLM) inputs
https://vigil.deadbits.ai/
Apache License 2.0
270 stars 32 forks source link

Vigil should be easier to initialize #51

Closed deadbits closed 7 months ago

deadbits commented 7 months ago

The vigil-server.py script is really overloaded with configuration parsing and setting up the scanners. I want to abstract this away so the API is less complicated and move towards Vigil being used like a Python library, instead of being reliant on the API server.

I've added a vigil/vigil.py script with a central Vigil() class. Now you just need to import that class and pass it a config file, and everything is handled. This also paves the way for creating more complex detection pipelines

from vigil.vigil import Vigil

vigil = Vigil.from_config('conf/openai.conf')

scan1 = vigil.input_scanner.perform_scan(
    input_prompt="prompt goes here"
)

if 'Potential prompt injection detected' in scan1['messages']:
    take_some_action()

vigil.output_scanner.perform_scan(
    input_text="prompt goes here",
    input_resp="response goes here"
)

canary_prompt = vigil.canary_tokens.add(
    prompt=prompt,
    always=always if always else False,
    length=length if length else 16, 
    header=header if header else '<-@!-- {canary} --@!->',
)
canary_prompt = canary_prompt + user_prompt
llm_response = call_llm(canary_prompt)
result = vigil.canary_tokens.check(prompt=llm_response)

if not result:
  # canary token not found in LLM response
  # this interaction may be indicative of goal hijacking
  # so lets add the detected user prompt to the vector db
  #  for future detections
  result, ids = vigil.vector_db.add(prompt=user_prompt)
deadbits commented 7 months ago

Completed by PR https://github.com/deadbits/vigil-llm/pull/52