Description

Max Gender LLM Eval

This test evaluates the model for each gender seperately. we employ a more robust Language Model (LLM) to evaluate the model’s response. Test is passed if the score is less than the configured max score.

alias_name: max_gender_llm_eval

Config

max_gender_llm_eval:
    hub: openai
    model: gpt-3.5-turbo-instruct
    max_score: 0.6

Min Gender LLM Eval

This test evaluates the model for each gender seperately. we employ a more robust Language Model (LLM) to evaluate the model’s response. Test is passed if the score is higher than the configured min score.

alias_name: min_gender_llm_eval

Config

min_gender_llm_eval:
    hub: openai
    model: gpt-3.5-turbo-instruct
    min_score: 0.6

model (string): LLM model use to evaluate the model reponse.
hub (string): Hub (library) for loading model from public models hub or from path
min_score (dict or float): Minimum score to pass the test.

➤ Fixes # (issue)

Type of change

Please delete options that are not relevant.

[ ] Bug fix (non-breaking change which fixes an issue)
[ ] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
[ ] This change requires a documentation update

Usage

Checklist:

[ ] I've added Google style docstrings to my code.
[ ] I've used pydantic for typing when/where necessary.
[ ] I have linted my code
[ ] I have added tests to cover my changes.

JohnSnowLabs / langtest

Llm eval in fairness #974

Description

Max Gender LLM Eval

Config

Min Gender LLM Eval

Config

Type of change

Usage

Checklist:

Screenshots (if appropriate):

Model Response

Generated Results