JohnSnowLabs / langtest

Deliver safe & effective language models
http://langtest.org/
Apache License 2.0
488 stars 36 forks source link

Feature/implement load & save for benchmark reports #999

Closed chakravarthik27 closed 4 months ago

chakravarthik27 commented 5 months ago

Description

This pull request introduces a significant upgrade to LangTest's evaluation capabilities, focusing on report management and leaderboards. These enhancements empower you to:

How it works:

First, create the parameter.json or parameter.yaml in the working directory

JSON Format

{
    "task": "question-answering",
    "model": {
        "model": "http://localhost:1234/v1/chat/completions",
        "hub": "lm-studio"
    },
    "data": [
        {
            "data_source": "MedMCQA"
        },
        {
            "data_source": "PubMedQA"
        },
        {
            "data_source": "MMLU"
        },
        {
            "data_source": "MedQA"
        }
    ],
    "config": {
        "model_parameters": {
            "max_tokens": 64
        },
        "tests": {
            "defaults": {
                "min_pass_rate": 1.0
            },
            "robustness": {
                "add_typo": {
                    "min_pass_rate": 0.70
                }
            },
            "accuracy": {
                "llm_eval": {
                    "min_score": 0.60
                }

            }
        }
    }
}

Yaml Format

task: question-answering
model:
  model: http://localhost:1234/v1/chat/completions
  hub: lm-studio
data:
- data_source: MedMCQA
- data_source: PubMedQA
- data_source: MMLU
- data_source: MedQA
config:
  model_parameters:
    max_tokens: 64
  tests:
    defaults:
      min_pass_rate: 1
    robustness:
      add_typo:
        min_pass_rate: 0.7
    accuracy:
      llm_eval:
        min_score: 0.6

And open the terminal or cmd in your system

langtest eval --model <your model name or endpoint> \
              --hub <model hub like hugging face, lm-studio, web ...> \
              -c < your configuration file like parameter.json or parameter.yaml>

Finally, we can know the leaderboard and rank of the model. image


To visualize the leaderboard anytime using the CLI command

langtest show-leaderboard

image