Kipok / NeMo-Skills

A pipeline to improve skills of large language models
https://kipok.github.io/NeMo-Skills/
Apache License 2.0
185 stars 41 forks source link

Adding llama3 prompts + llm-as-a-judge eval #74

Closed Kipok closed 3 months ago

Kipok commented 3 months ago

Example 405B as a judge

python pipeline/run_eval.py --model_path /mnt/datadrive/models/Meta-Llama-3.1-8B-Instruct --server_type vllm --output_dir test-3.1 --benchmarks math:0 --num_gpus 2 --num_nodes 1 --prompt_folder llama3 --model_version instruct ++split_name=test batch_size=512 --num_jobs 1 ++inference.tokens_to_generate=5120 ++skip_filled=True --extra_eval_args "++eval_config.grading_type=llm ++eval_config.grading_config.use_batch_api=False ++eval_config.grading_config.judge_model='meta/llama-3.1-405b-instruct' ++eval_config.grading_config.base_url='https://integrate.api.nvidia.com/v1'"

gpt-4 as a judge

python pipeline/run_eval.py --model_path /mnt/datadrive/models/Meta-Llama-3.1-8B-Instruct --server_type vllm --output_dir test-3.1 --benchmarks math:0 --num_gpus 2 --num_nodes 1 --prompt_folder llama3 --model_version instruct ++split_name=test batch_size=512 --num_jobs 1 ++inference.tokens_to_generate=5120 ++skip_filled=True --extra_eval_args "++eval_config.grading_type=llm"