OpenGPTX / lm-evaluation-harness

A framework for few-shot evaluation of autoregressive language models.
MIT License
8 stars 8 forks source link

xquad, mlqa and mlsum tasks are not correctly implemented #75

Open KlaudiaTH opened 1 year ago

KlaudiaTH commented 1 year ago

@katrinklug

When running the tasks mlqa_en,mlsum_en, xquad_en and xquad_en I get the following error message:

Traceback (most recent call last):
  File "./tasks/eval_harness/evaluate.py", line 446, in <module>
    main()
  File "./tasks/eval_harness/evaluate.py", line 429, in main
    results = evaluator.evaluate(adaptor, {task_name: task}, False, 0, None, bootstrap_iters=args.bootstrap_iters)
  File "/lm-evaluation-harness/lm_eval/utils.py", line 162, in _wrapper
    return fn(*args, **kwargs)
  File "/lm-evaluation-harness/lm_eval/evaluator.py", line 253, in evaluate
    resps = getattr(lm, reqtype)([req.args for req in reqs])
  File "/lm-evaluation-harness/lm_eval/base.py", line 343, in greedy_until
    re_ord = utils.Reorderer(requests, _collate)
  File "/lm-evaluation-harness/lm_eval/utils.py", line 125, in __init__
    arr = group(arr, lambda x: fn(x[1]))
  File "/lm-evaluation-harness/lm_eval/utils.py", line 59, in group
    res[fn(ob)].append(ob)
  File "/lm-evaluation-harness/lm_eval/utils.py", line 125, in <lambda>
    arr = group(arr, lambda x: fn(x[1]))
  File "/lm-evaluation-harness/lm_eval/base.py", line 340, in _collate
    toks = self.tok_encode(x[0])
  File "/lm-evaluation-harness/lm_eval/models/gpt2.py", line 122, in tok_encode
    return self.tokenizer.encode(string, add_special_tokens=False)
AttributeError: '_GPT2BPETokenizer' object has no attribute 'encode'

Performed evaluation on Taurus using scripts from evaluation repository: apptainer/juwels_german-evalds.sbatch

janEbert commented 1 year ago

Any perplexity tasks fail, same goes for the German ones. The issue is that the EvalHarnessAdaptor in Megatron-DeepSpeed/tasks/eval_harness/evaluate.py does not implement the greedy_until method, which perplexity tasks require.

katrinklug commented 1 year ago

Happened also for germanquad and squad2

KlaudiaTH commented 1 year ago

New images ... Taurus: /projects/p025/p_gptx/apptainer_images/obmd-lmeval-21.12_100423-py3.sif Juwels: /p/scratch/opengptx-elm/shared/apptainer_images/obmd-lmeval-21.12_100423-py3.sif