OpenGPTX / lm-evaluation-harness

A framework for few-shot evaluation of autoregressive language models.
MIT License
10 stars 8 forks source link

Batch evaluation script #92

Open jjbuschhoff opened 1 year ago

jjbuschhoff commented 1 year ago

Sbatch script that performs evaluation on a given set of tasks for a given collection of model checkpoints using the Megatron-LM-client-server inference solution.

jjbuschhoff commented 1 year ago

Also, the whole argument handling portion of run_server_no_opt.py is way too manual. There should be more abstraction here. For example, many command line flags are already missing. Continued experimentation would imply that every single change that adds arguments would also need to manually add them here, lest they would be ignored.

I agree, I had a look into this and it seems that it is possible to call run_text_generation_server.py --use_checkpoint_args directly, however, this only sets some of the hyperparameters as per the checkpoint (see load_args_from_checkpoint in megatron.checkpointing), likely those that are necessary for training. The checkpoint_args are returned but unused in megatron.initialize.initialize_megatron(). I'm looking into a way to merge them that doesn't lead validate_args failing.

janEbert commented 1 year ago

That's a great find!