EQ-bench / EQ-Bench

A benchmark for emotional intelligence in large language models
MIT License
180 stars 13 forks source link

Support for Seq2Seq LMs #3

Closed CarlsVoca closed 7 months ago

CarlsVoca commented 8 months ago

How can I run the local flan-t5 model on this benchmark? I found that oobabooga already supports Seq2Seq models : Add support for Seq2Seq LMs I tried to test using alpaca's instruction template, but didn't get any output.

# config.cfg
[Oobabooga config]
ooba_launch_script = ~/text-generation-webui/start_linux.sh
ooba_params_global = 
automatically_launch_ooba = true
ooba_request_timeout = 120

[Benchmarks to run]
run-t5, None, ~/text-generation-webui/models/flan-t5-xl, , None, 1, ooba, --loader transformers --n_ctx 1024 --n-gpu-layers -1, 
# benchmark_results.csv
run-t5,2024-01-26 16:32:07,Alpaca,/home/user/text-generation-webui/models/flan-t5-xl,,none,FAILED,FAILED,FAILED,1,ooba,,,0.0 questions were parseable (min is 83%)
# raw_results
    "run-t5--v2--/home/user/text-generation-webui/models/flan-t5-xl----Alpaca--none--ooba----": {
        "run_metadata": {
            "run_id": "run-t5",
            "eq_bench_version": "v2",
            "instruction_template": "Alpaca",
            "model_path": "/home/user/text-generation-webui/models/flan-t5-xl",
            "lora_path": "",
            "bitsandbytes_quant": "none",
            "total_iterations": 1,
            "inference_engine": "ooba",
            "ooba_params": "",
            "include_patterns": [],
            "exclude_patterns": []
        },
        "iterations": {
            "1": {
                "respondent_answers": {},
                "individual_scores": {},
                "individual_scores_fullscale": {},
                "raw_inference": {},
                "benchmark_results_fullscale": {
                    "first_pass_score": 0,
                    "first_pass_parseable": 0,
                    "revised_score": 0,
                    "revised_parseable": 0,
                    "final_score": 0,
                    "final_parseable": 0
                }
            }
        }
    }

I would be very grateful for your help.

sam-paech commented 7 months ago

I gave this a try. It seems the model isn't instruction tuned, so it has a hard time responding in the required format.

For models like this you can try setting REVISE=False in lib/run_bench.py. This simplifies the prompt and the output requirements, making it easier for less capable models to follow. However it didn't help in this case. It tends to produce text like this:

Defensive: score> Hurt: score> Stubborn: score> Enlightened: score>

Indicating that it doesn't understand the instruction.

CarlsVoca commented 7 months ago

I gave this a try. It seems the model isn't instruction tuned, so it has a hard time responding in the required format.

For models like this you can try setting REVISE=False in lib/run_bench.py. This simplifies the prompt and the output requirements, making it easier for less capable models to follow. However it didn't help in this case. It tends to produce text like this:

Defensive: score> Hurt: score> Stubborn: score> Enlightened: score>

Indicating that it doesn't understand the instruction.

I tried what you mentioned and came to the same conclusion. Thank you.