Refrain Nemo-Guardrails to Send the Actual User Input to LLM

minghongg commented 3 months ago

Hi team,

Is it possible to configure Nemo-Guardrails to avoid sending the actual user input to the LLM? I understand that the actual user input won't be sent if the input rails are triggered. However, is it also possible to prevent the user input from being sent, regardless of whether the input rails are triggered or not? Thanks!

Drewwb commented 3 months ago

Yes, it is possible to configure NeMo Guardrails to avoid sending the actual user input to the LLM, regardless of whether input rails are triggered.

In your colang file you could add the following:

define user_input_passes_guardrails as user says something not offensive or inappropriate

when user_input_passes_guardrails:
    bot says "pass"
    action stop_processing  # This stops the input from being sent to the LLM

You could also chain another LLM to your guardrails:

Use a secondary LLM as a pre-processing step. I would call this an Observer.
This secondary LLM evaluates the user input against the defined guardrails.
It outputs a simple "pass" or "fail" result.
If the input passes, a sanitized or rephrased version of the input (not the original) is sent to the primary LLM. Or even just a simple "pass" is sent to the primary LLM.
If it fails, the input is blocked entirely. Or "fail" is sent to primary LLM

In essence, you would prompt your Observer LLM in a way that makes sure it doesn't output the user's input.

Pouyanpi commented 2 months ago

@minghongg, do you mean that you want to use predefined flows only?

What do you want to do with the user input? It'd be great if you can explain your use case more

drazvan commented 2 months ago

Addi this for reference as well: https://docs.nvidia.com/nemo/guardrails/user_guides/input_output_rails_only/README.html#using-only-input-and-output-rails .

Pouyanpi commented 4 weeks ago

@minghongg did any of the answers resolved your issue? If yes, please close it.

NVIDIA / NeMo-Guardrails

Refrain Nemo-Guardrails to Send the Actual User Input to LLM #706