EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.
https://www.eleuther.ai
MIT License
5.87k stars 1.57k forks source link

How to use Zeno #1842

Open DavidAdamczyk opened 2 months ago

DavidAdamczyk commented 2 months ago

I would like to ask how to correctly use the script for uploading to Zeno. I use the same example as https://github.com/EleutherAI/lm-evaluation-harness/blob/main/examples/visualize-zeno.ipynb

accelerate launch -m lm_eval \
    --model hf \
    --model_args pretrained=google/gemma-1.1-2b-it \
    --tasks imdb \
    --batch_size 32 \
    --device cuda \
    --log_samples \
    --output_path outputs/gemma-1.1-2b-it \
    --limit 10
python scripts/zeno_visualize.py --data_path outputs --project_name "Zeno test"

Traceback (most recent call last):
  File "/raid/data/david/lm-evaluation-harness/scripts/zeno_visualize.py", line 219, in <module>
    main()
  File "/raid/data/david/lm-evaluation-harness/scripts/zeno_visualize.py", line 48, in main
    tasks = set(tasks_for_model(models[0], args.data_path))
  File "/raid/data/david/lm-evaluation-harness/scripts/zeno_visualize.py", line 130, in tasks_for_model
    json.load(open(Path(dir_path, "results.json"), encoding="utf-8"))["configs"],
FileNotFoundError: [Errno 2] No such file or directory: 'outputs/gemma-1.1-2b-it/results.json'

The error seems weird to me because this is the directory structure:

outputs
└── gemma-1.1-2b-it
    └── google__gemma-1.1-2b-it
        ├── results_2024-05-14T17-07-12.477774.json
        └── samples_imdb_2024-05-14T17-07-12.477774.json

I created more outputs, renamed some files and folders, and got this error message:

FileNotFoundError: [Errno 2] No such file or directory: 'output/imdb_gemma_2b_it/google__gemma-1.1-2b-it/pretrained=google__gemma-1.1-2b-it,parallelize=True,load_in_4bit=True_imdb.jsonl'

Can anyone please suggest how to use those scripts correctly?

KonradSzafer commented 1 month ago

Hi @DavidAdamczyk! I fixed the filename problem in #1926 - now the script will use the most recent result files for comparison. However, I'm not sure if the rest of the script is correct, there may have been other breaking changes introduced in the meantime. I fixed loading the data, but it still gives me errors later in the generate_dataset() function, so I suspect that part was broken before by changing the structure of the samples results files.

In the current version of the library, you no longer need to specify the model name in the output path, just use --output_path outputs and it will be created automatically. Then, when using the Zeno script, you just need to point to the outputs dir instead of a specific model.