Closed wwngh1233 closed 3 weeks ago
Can you provide the execution command you used, the output of accelerate env
and full stack trace?
put all tasks into one scripts: accelerate launch main.py \ --model $bath_path$Model \ --tasks $python_tasks_easy,$math_tasks_greedy,$python_tasks_hard \ --max_length_generation 512 \ --temperature 1.0 \ --do_sample False \ --top_k 1 \ --n_samples 1 \ --batch_size 1 \ --precision fp16 \ --allow_code_execution \ --save_generations \ --save_generations_path results/$Model/$save_prefix.json
python_tasks_easy="humaneval,mbpp" python_tasks_medium="ds1000-numpy-completion,ds1000-pandas-completion,ds1000-scipy-completion,ds1000-matplotlib-completion,ds1000-sklearn-completion,ds1000-pytorch-completion" #instruct-humaneval,instruct-humaneval-nocontext" ds1000-tensorflow-completion ds1000-all-completion
math_tasks_greedy="pal-gsm8k-greedy,pal-gsmhard-greedy" math_tasks_majority_voting="pal-gsm8k-majority_voting,pal-gsmhard-majority_voting"
python_tasks_hard="apps-introductory,apps-interview,apps-competition"
Can you make sure it runs properly for one task? and then increment to find what's causing the issue-
I think I am experiencing a similar issue.
When running the following command on a single node with multiple GPUs (8):
accelerate launch main.py --model bigcode/santacoder --task multiple-py,mbpp --n_samples 1 --batch_size 1 --max_length_generation 50 --temperature 0.2 --trust_remote_code --generation_only --save_generations --save_references
I get the following error:
IndexError: list index out of range
Traceback (most recent call last):
File "/home/bigcode-evaluation-harness/main.py", line 277, in <module>
main()
File "/home/bigcode-evaluation-harness/main.py", line 249, in main
generations, references = evaluator.generate_text(task)
File "/home/bigcode-evaluation-harness/lm_eval/evaluator.py", line 45, in generate_text
generations = parallel_generations(
File "/home/bigcode-evaluation-harness/lm_eval/generation.py", line 104, in parallel_generations
generations = complete_code(
File "/home/bigcode-evaluation-harness/lm_eval/utils.py", line 273, in complete_code
code_gens[sample].append(
I didn't find a difference with other values of n_samples
or batch_size
. --max_length_generation 50
is just for speed in this example. Other task combinations give the same issue. More than 1 task seems to be the issue for me.
There is no error if I:
It seems there's an issue with the processors accessing simultaneously different tasks. The save_generations_path
also needs to be separate for each task.
While this gets fixed, I suggest you only evaluate on a single task and use a bash loop to go over multiple tasks instead of doing it in the harness, since it was intended to run sequentially anyway:
tasks=(multiple-py multiple-java mbpp)
for task in "${tasks[@]}"; do
echo "Running task $task"
accelerate launch main.py --model bigcode/santacoder
--task $task \
--n_samples 1 \
--batch_size 1 \
--max_length_generation 50 \
--temperature 0.2 \
--trust_remote_code \
--generation_only \
--save_generations_path generations_$task.json
done
bigcode-evaluation-harness/lm_eval/utils.py:388 in │ │ complete_code │ │ │ │ 385 │ │ │ if not INFILL_MODE: │ │ 386 │ │ │ │ gen_code = gen_code[len(prefix) :] │ │ 387 │ │ │ if postprocess: │ │ ❱ 388 │ │ │ │ code_gens[sample].append( │ │ 389 │ │ │ │ │ task.postprocess_generation(gen_code, int(sample))