Issue with HF models not finding logits for log_softmax() during lm_eval.simple_evaluate

DrewGalbraith commented 7 months ago

Problem Description

While running lm_eval.simple_eval(...), I'm getting the following error:

AttributeError: 'CausalLMOutputWithPast' object has no attribute 'log_softmax' (Full traceback below)

It gets up to <TIME_STAMP> INFO [evaluator.py:314] Running loglikelihood requests and then breaks. The error is that some models I run through break when F.log_softmax(...) because they access the LM output object (i.e., CausalLmOutputWithPast%20outputs.-,CausalLMOutputWithPast,-class%20transformers.modeling_outputs)), not its logits attribute. This doesn't seem to happen to all models, and I am having trouble figuring out what kind of models don't cause this! I have fixed this twice in the last few weeks by adding a try-except statements, once to the LM-Eval huggingface.py and once to PyTorch's torch/nn/functional.py. This try-except statement has the basic form:

try:
    ret = input.log_softmax(dim)
except AttributeError:
    ret = input.logits.log_softmax(dim)

This seems like a shoddy solution, so I'm looking for something more permanent. The try-except, btw, makes it compatible with the models that don't err out with this.

Reproducible example:

Some info:

I'm using a HuggingFace model
I'm running mmlu for the task

import lm_eval

device:str ='cuda'  # 'cuda' or 'cpu'

lm_eval.tasks.initialize_tasks()
results = lm_eval.simple_evaluate(
    model="hf",
    model_args=f"pretrained={<PATH_TO_HF_REPO>}," + \
        f"tokenizer={<PATH_TO_HF_REPO>}",
    tasks=['mmlu'],
    num_fewshot=0,
    device=device)

Full traceback:

    run_eval(config)
  File "/home/<USER>/301R/repos/301r_retnet/slurm/user_slurm/../../src/run_eval.py", line 34, in run_eval
    results = lm_eval.simple_evaluate(
              ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/<USER>/.conda/envs/<SOME_ENV>/lib/python3.11/site-packages/lm_eval/utils.py", line 415, in _wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/<USER>/.conda/envs/<SOME_ENV>/lib/python3.11/site-packages/lm_eval/evaluator.py", line 150, in simple_evaluate
    results = evaluate(
              ^^^^^^^^^
  File "/home/<USER>/.conda/envs/<SOME_ENV>/lib/python3.11/site-packages/lm_eval/utils.py", line 415, in _wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/<USER>/.conda/envs/<SOME_ENV>/lib/python3.11/site-packages/lm_eval/evaluator.py", line 325, in evaluate
    resps = getattr(lm, reqtype)(cloned_reqs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/<USER>/.conda/envs/<SOME_ENV>/lib/python3.11/site-packages/lm_eval/models/huggingface.py", line 774, in loglikelihood
    return self._loglikelihood_tokens(new_reqs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/<USER>/.conda/envs/<SOME_ENV>/lib/python3.11/site-packages/lm_eval/models/huggingface.py", line 987, in _loglikelihood_tokens
    multi_logits = F.log_softmax(
                   ^^^^^^^^^^^^^^
  File "/home/<USER>/.conda/envs/<SOME_ENV>/lib/python3.11/site-packages/torch/nn/functional.py", line 1945, in log_softmax
    ret = input.log_softmax(dim)
          ^^^^^^^^^^^^^^^^^
AttributeError: 'CausalLMOutputWithPast' object has no attribute 'log_softmax'

LSinev commented 7 months ago

for reproducibility, please tell versions of lm_eval and transformers also.

And also please check your pipeline with latest clone of this repo. With latest repo this log_softmax call is at different place https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/models/huggingface.py#L1045 and should be using logits attribute always: https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/models/huggingface.py#L760

With latest release v0.4.2 logits are always used too: https://github.com/EleutherAI/lm-evaluation-harness/blob/v0.4.2/lm_eval/models/huggingface.py#L748

haileyschoelkopf commented 7 months ago

Hi! In addition to what @LSinev said, sharing the exact HuggingFace model this does or does not error on would also be helpful.

To my knowledge, since at least v0.4.0 .logits should always be used for models.

DrewGalbraith commented 7 months ago

Thanks for getting back to me so fast! @LSinev mamba list shows lm_eval==0.4.0, torch==2.1.2, and transformers==4.36.2, @haileyschoelkopf, the model is Llama-2-7b-chat-hf.

@LSinev Looking at lm_eval v0.4.2 lines you referenced, I'm thinking the newer verison will probably work. I was able to find those same lines in v0.4.0 in the repo online, but looking at our downloaded version of 0.4.0, line 507 doesn't have the .logits returned, just the model output. That's likely where the error is. Either our version of the repo was some pre-logits-commit or one of our team deleted the attribute for some reason. 🫣

After adding the attribute to that line, I have confirmed that the benchmark runs now with this model, both for mmlu and for good measure hellaswag.

Thanks for the pointer!

EleutherAI / lm-evaluation-harness