Open jjbuschhoff opened 1 year ago
Also, the whole argument handling portion of
run_server_no_opt.py
is way too manual. There should be more abstraction here. For example, many command line flags are already missing. Continued experimentation would imply that every single change that adds arguments would also need to manually add them here, lest they would be ignored.
I agree, I had a look into this and it seems that it is possible to call run_text_generation_server.py --use_checkpoint_args
directly, however, this only sets some of the hyperparameters as per the checkpoint (see load_args_from_checkpoint
in megatron.checkpointing
), likely those that are necessary for training. The checkpoint_args are returned but unused in megatron.initialize.initialize_megatron()
. I'm looking into a way to merge them that doesn't lead validate_args
failing.
That's a great find!
Sbatch script that performs evaluation on a given set of tasks for a given collection of model checkpoints using the Megatron-LM-client-server inference solution.