EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.
https://www.eleuther.ai
MIT License
6.92k stars 1.85k forks source link

log_samples with multi args in model_args #1664

Open nicho2 opened 7 months ago

nicho2 commented 7 months ago

i obtain an error when lm_eval try to save result in file. lm_eval uses model_args to do the file name but it don't remove the ':' character

the error: Traceback (most recent call last): File "", line 198, in _run_module_as_main File "", line 88, in _run_code File "c:\python311\Scripts\lm_eval.exe__main.py", line 7, in File "C:\Projects\transpose\lm-evaluation-harness\lm_eval__main.py", line 400, in cli_evaluate filename.write_text(samples_dumped, encoding="utf-8") File "C:\Python311\Lib\pathlib.py", line 1078, in write_text with self.open(mode='w', encoding=encoding, errors=errors, newline=newline) as f: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Python311\Lib\pathlib.py", line 1044, in open return io.open(self, mode, buffering, encoding, errors, newline) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ OSError: [Errno 22] Invalid argument: 'output\mistral-7b-instruct-v0.2-LG\modelsn__mistral-7b-instruct-v0.2-LG,base_urlhttp:__10.2.42.198:1234v1_gsm8k.jsonl'

the command: lm_eval --model local-chat-completions --tasks gsm8k --model_args model=sn/mistral-7b-instruct-v0.2-LG,base_url=http://10.2.42.198:1234/v1 --log_samples --output_path output/mistral-7b-instruct-v0.2-LG/ --limit 4

perhaps you can modify : outputname = "{}{}".format( re.sub("/|=", "__", args.model_args), task_name ) by :

outputname = "{}{}".format( re.sub("/|=|:", "__", args.model_args), task_name )

nicho2 commented 7 months ago

same thing with zeno_visualize.py

        model_args = re.sub(
            "/|=|:",
            "__",
            json.load(
                open(Path(args.data_path, model, "results.json"), encoding="utf-8")
            )["config"]["model_args"],
haileyschoelkopf commented 7 months ago

Hello, thank you for raising this issue!

Since you've pointed out a fix for this, perhaps you could create a PR with this fix applied?