bigcode-project / bigcode-evaluation-harness

A framework for the evaluation of autoregressive code generation language models.
Apache License 2.0
795 stars 214 forks source link

main() crashes with --allow-code-execution=True #14

Closed ocramz closed 2 years ago

ocramz commented 2 years ago

The call to .generate() in utils.py complete_code() seems to be mis-configured, since it produces the stack trace below.

Here I use model='hf-internal-testing/tiny-random-gpt2' (but codeparrot fails in the same way), and allow-code-execution=True

Traceback (most recent call last):
  File "~/bigcode-evaluation-harness/main.py", line 147, in <module>
    main()
  File "~/bigcode-evaluation-harness/main.py", line 132, in main
    results[task] = evaluator.evaluate(task)
  File "~/bigcode-evaluation-harness/lm_eval/evaluator.py", line 193, in evaluate
    generations, references = self.generate_text(task)
  File "~/bigcode-evaluation-harness/lm_eval/evaluator.py", line 70, in generate_text
    generations = parallel_generations(
  File "~/bigcode-evaluation-harness/lm_eval/generation.py", line 140, in parallel_generations
    generations = complete_code(
  File "~/bigcode-evaluation-harness/lm_eval/utils.py", line 177, in complete_code
    generated_tokens = accelerator.unwrap_model(model).generate(
  File "/Users/marco/mambaforge/envs/BigCode/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/Users/marco/mambaforge/envs/BigCode/lib/python3.10/site-packages/transformers/generation_utils.py", line 1320, in generate
    return self.sample(
  File "/Users/marco/mambaforge/envs/BigCode/lib/python3.10/site-packages/transformers/generation_utils.py", line 1938, in sample
    outputs = self(
  File "/Users/marco/mambaforge/envs/BigCode/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/marco/mambaforge/envs/BigCode/lib/python3.10/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 1048, in forward
    transformer_outputs = self.transformer(
  File "/Users/marco/mambaforge/envs/BigCode/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/marco/mambaforge/envs/BigCode/lib/python3.10/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 835, in forward
    position_embeds = self.wpe(position_ids)
  File "/Users/marco/mambaforge/envs/BigCode/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/marco/mambaforge/envs/BigCode/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 158, in forward
    return F.embedding(
  File "/Users/marco/mambaforge/envs/BigCode/lib/python3.10/site-packages/torch/nn/functional.py", line 2199, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
IndexError: index out of range in self
loubnabnl commented 2 years ago

This can happen if you didn't change max_length_generation , by default it is 2048 but your model's context size is 512 and codeparrot's is 1024 hence the index out of rangeerror.

If that doesn't work can you share your execution command because it works for me?

ocramz commented 2 years ago

Thank you @loubnabnl , that is indeed the case.

I wonder if we could remove this particular footgun, by making max_length_generation not a free parameter but rather a function of the specific model used?

loubnabnl commented 2 years ago

Actually sometimes you don’t need to use the whole context size of the model for this parameter to speed up the generation, for example for HumanEval and MBPP benchmarks the prompts and their solutions are usually short and max_length_generation doesn’t need to be more than 512.

But we can reduce the default value to 1024 or 512.