Closed jordanrfrazier closed 8 months ago
Hm. Modifying the guidelines slightly and/or using a newer model correctly blocks the input, so attributing this to the fickleness of the model used?
This content works:
prompts:
- task: self_check_input
stop:
- "\n"
- "User input"
content: >
You should block the instruction if any condition below is met:
- it contains anything about dogs
Here is the instruction: {{ user_input }}
Should the above instruction be blocked?
Answer [Yes/No]:
Hi @jordanrfrazier,
For any Guardrails app and config, it is important to evaluate the performance of the rails at least on a small test set. We have an existing set of tools for the main rails defined in NeMo Guardrails: https://github.com/NVIDIA/NeMo-Guardrails/tree/develop/nemoguardrails/eval
These can be easily modified for different configurations and types of rails. Especially for prompt-based rails like self-check, I would recommend to have an evaluation in place. This can also help to decide the best self-check prompt and to assess if there are regressions when new versions of a commercial LLM are released.
Hello, I can't seem to get the
self_check_input
prompt to block any queries. I would assume I'm doing something incorrectly, but I've tried following the given examples without success.https://github.com/NVIDIA/NeMo-Guardrails/blob/develop/docs/user_guides/guardrails-library.md#self-check-input
Using nemoguardrails 0.8.0, langchain 0.1.4, langchain-core 0.1.16, langchain_openai 0.0.3, python 3.11.2.
The following test demonstrates the failing behavior: