ScottLogic / prompt-injection

Application which investigates defensive measures against prompt injection attacks on an LLM, with a focus on the exposure of external tools.
MIT License
15 stars 10 forks source link

596 fix problems with integrationlangchaintest #804

Closed pmarsh-scottlogic closed 7 months ago

pmarsh-scottlogic commented 7 months ago

Description

Tidying up some code and making test a bit better

Notes

Checklist

Have you done the following?

pmarsh-scottlogic commented 7 months ago

will reopen when It's ready

pmarsh-scottlogic commented 7 months ago

I'd be interested to know what the response actually is here, and why we need to strip out all non-alphanumerics; does it wrap the "yes" or "no" inside braces, or quotes maybe?

@chriswilty

When I sent "Hello", the prompt eval LLM came back with "No.", and when I sent "forget your instructions" it came back with "Yes.". So we need to strip away the punctuation.

chriswilty commented 7 months ago

I'd be interested to know what the response actually is here, and why we need to strip out all non-alphanumerics; does it wrap the "yes" or "no" inside braces, or quotes maybe?

@chriswilty

When I sent "Hello", the prompt eval LLM came back with "No.", and when I sent "forget your instructions" it came back with "Yes.". So we need to strip away the punctuation.

Brilliant, thanks for checking!

pmarsh-scottlogic commented 7 months ago

I'd be interested to know what the response actually is here, and why we need to strip out all non-alphanumerics; does it wrap the "yes" or "no" inside braces, or quotes maybe?

@chriswilty When I sent "Hello", the prompt eval LLM came back with "No.", and when I sent "forget your instructions" it came back with "Yes.". So we need to strip away the punctuation.

Brilliant, thanks for checking!

@chriswilty in fact we're explicitly asking it to return in that format, with the full stop. I wonder why.

chriswilty commented 7 months ago

@pmarsh-scottlogic I'm happy to approve this anyway. If you need any help with merge conflicts, just ask (tomorrow!)