JohnSnowLabs / langtest

Deliver safe & effective language models
http://langtest.org/
Apache License 2.0
488 stars 36 forks source link

Llm eval in fairness #974

Closed Prikshit7766 closed 6 months ago

Prikshit7766 commented 6 months ago

Description

Max Gender LLM Eval

This test evaluates the model for each gender seperately. we employ a more robust Language Model (LLM) to evaluate the model’s response. Test is passed if the score is less than the configured max score.

alias_name: max_gender_llm_eval

Config

max_gender_llm_eval:
    hub: openai
    model: gpt-3.5-turbo-instruct
    max_score: 0.6

Min Gender LLM Eval

This test evaluates the model for each gender seperately. we employ a more robust Language Model (LLM) to evaluate the model’s response. Test is passed if the score is higher than the configured min score.

alias_name: min_gender_llm_eval

Config

min_gender_llm_eval:
    hub: openai
    model: gpt-3.5-turbo-instruct
    min_score: 0.6

➤ Fixes # (issue)

Type of change

Please delete options that are not relevant.

Usage

Checklist:

Screenshots (if appropriate):

Model Response

image

Generated Results

image