JohnSnowLabs / langtest

Deliver safe & effective language models
http://langtest.org/
Apache License 2.0
496 stars 39 forks source link

User prompt handling for multi-dataset testing #1010

Closed chakravarthik27 closed 5 months ago

chakravarthik27 commented 5 months ago

The following way to configure the multi-prompt for multi-dataset testing

harness = Harness(
    task="question-answering",
    model={"model": "http://localhost:1234/v1/chat/completions", "hub": "lm-studio"},
    data=[
        {"data_source": "BoolQ", "split": "test-tiny"},
        {"data_source": "NQ-open", "split": "test-tiny"},
        {"data_source": "MedQA", "split": "test-tiny"},
        {"data_source": "LogiQA", "split": "test-tiny"},
    ],
    config={
        "model_parameters": {
            "max_tokens": 64,
            "user_prompt": {
                "BoolQ": "Answer the following question with a yes or no: {question}",
                "NQ-open": "Answer the following question with a short answer: {question}",
                "MedQA": "Answer the following medical question: {question} {options}",
                "LogiQA": "Answer the following logic question: {question} {options}"
            }
        },
        "tests": {
            "defaults": {
                "min_pass_rate": 1.0
            },
            "robustness": {
                "add_typo": {
                    "min_pass_rate": 0.7
                },
                "lowercase": {
                    "min_pass_rate": 0.7
                }
            }
        }
    }
)