New Evaluation Tooling - Githubissues

This PR adds new tooling for evaluating a guardrail configuration. NOTE: documentation is minimal; still WIP.

Below is a quick overview for the nemoguardrails eval CLI.

Run Evaluations

To run a new evaluation with a guardrail configuration:

nemoguardrails eval run -g <GUARDRAIL_CONFIG_PATH> -o <OUTPUT_PATH>

Check Compliance

To check the compliance with the policies, you can use the LLM-as-a-judge method.

nemoguardrails eval check-compliance --llm-judge=<LLM_MODEL_NAME> -o <OUTPUT_PATH>

You can use any LLM supported by NeMo Guardrails.

models:
  - type: llm-judge
    engine: openai
    model: gpt-4

  - type: llm-judge
    engine: nvidia_ai_endpoints
    model: meta/llama3-70b-instruct

Review and Analyze

To review and analyze the results, launch the NeMo Guardrails Eval UI:

nemoguardrails eval ui

NVIDIA / NeMo-Guardrails

New Evaluation Tooling #677

Run Evaluations

Check Compliance

Review and Analyze