when executes the OPT 6.7B model evaluation, the problem TypeError: 'NoneType' object is not iterable occur

yanchenmochen commented 2 months ago

Running loglikelihood requests:   0%|                                                                                                        | 0/18330 [00:00<?, ?it/s]

the error info is 
  File "/usr/local/lib/python3.10/dist-packages/lm_eval/models/vllm_causallms.py", line 493, in _parse_logprobs
    continuation_logprobs_dicts = [
  File "/usr/local/lib/python3.10/dist-packages/lm_eval/models/vllm_causallms.py", line 448, in _loglikelihood_tokens
    answer = self._parse_logprobs(
  File "/usr/local/lib/python3.10/dist-packages/lm_eval/api/model.py", line 371, in loglikelihood
    return self._loglikelihood_tokens(new_reqs, disable_tqdm=disable_tqdm)
  File "/usr/local/lib/python3.10/dist-packages/lm_eval/evaluator.py", line 449, in evaluate
    resps = getattr(lm, reqtype)(cloned_reqs)
  File "/usr/local/lib/python3.10/dist-packages/lm_eval/utils.py", line 395, in _wrapper
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/lm_eval/evaluator.py", line 277, in simple_evaluate
    results = evaluate(
  File "/usr/local/lib/python3.10/dist-packages/lm_eval/utils.py", line 395, in _wrapper
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/lm_eval/__main__.py", line 375, in cli_evaluate
    results = evaluator.simple_evaluate(
  File "/usr/local/bin/lm-eval", line 8, in <module>
    sys.exit(cli_evaluate())
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main (Current frame)
    return _run_code(code, main_globals, None,
TypeError: 'NoneType' object is not iterable

yanchenmochen commented 2 months ago

The command I use is simple as follows:

/usr/local/bin/lm-eval --model vllm --model_args pretrained=/mnt/self-define/songquanheng/model/opt-6.7b,tensor_parallel_size=1,gpu_memory_utilization=0.8 --tasks lambada_openai,arc_easy,piqa --device cuda:0

yanchenmochen commented 2 months ago

The code releated with this error is

  @staticmethod
    def _parse_logprobs(tokens: List, outputs, ctxlen: int) -> Tuple[float, bool]:
        """Process logprobs and tokens.

        :param tokens: list
            Input tokens (potentially left-truncated)
        :param outputs: RequestOutput
            Contains prompt_logprobs
        :param ctxlen: int
            Length of context (so we can slice them away and only keep the predictions)
        :return:
            continuation_logprobs: float
                Log probabilities of continuation tokens
            is_greedy: bool
                Whether argmax matches given continuation exactly
        """

        # The first entry of prompt_logprobs is None because the model has no previous tokens to condition on.
        continuation_logprobs_dicts = outputs.prompt_logprobs

        def coerce_logprob_to_num(logprob):
            # vLLM changed the return type of logprobs from float
            # to a Logprob object storing the float value + extra data
            # (https://github.com/vllm-project/vllm/pull/3065).
            # If we are dealing with vllm's Logprob object, return
            # the logprob value stored as an attribute. Otherwise,
            # return the object itself (which should be a float
            # for older versions of vLLM).
            return getattr(logprob, "logprob", logprob)

        continuation_logprobs_dicts = [
            {
                token: coerce_logprob_to_num(logprob)
                for token, logprob in logprob_dict.items()
            }
            if logprob_dict is not None
            else None
            for logprob_dict in continuation_logprobs_dicts
        ]

        # Calculate continuation_logprobs
        # assume ctxlen always >= 1
        continuation_logprobs = sum(
            logprob_dict.get(token)
            for token, logprob_dict in zip(
                tokens[ctxlen:], continuation_logprobs_dicts[ctxlen:]
            )
        )

yanchenmochen commented 2 months ago

so what is wrong with this execution, and what can i do to solve this problem

haileyschoelkopf commented 2 months ago

Hi, is there any more to the error output than what you've shared?

Does your locally saved OPT model differ at all from the one downloaded directly from HF?

I ran the command you provided on my machine and did not replicate the error you were getting.

yanchenmochen commented 2 months ago

Yes, I save the OPT model at local directory. when I use the lm-eval command, lm-eval --tasks list cannot generate valid outputs.

and I tried to install lm-evaluate-harness， after cloing the repostiroy, I run the command "pip insall -e . "， the command will insall "UNKNOWN 0.0.0", at the same time, the executing command "lm-eval and lm_eval" does not generate successfully. donot know the reason

yanchenmochen commented 2 months ago

when using the model value of "hf", it will work well.

yanchenmochen commented 2 months ago

root@145206f3e691:/mnt/self-define/sunning/lmdeploy/vllm_test# lm_eval --model vllm --model_args pretrained=/mnt/self-define/songquanheng/model/opt-6.7b --tasks arc_easy --device cuda:0
INFO 08-06 02:41:19 llm_engine.py:103] Initializing an LLM engine (v0.4.2) with config: model='/mnt/self-define/songquanheng/model/opt-6.7b', speculative_config=None, tokenizer='/mnt/self-define/songquanheng/model/opt-6.7b', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=2048, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=1234, served_model_name=/mnt/self-define/songquanheng/model/opt-6.7b)
INFO 08-06 02:41:19 selector.py:37] Using FlashAttention-2 backend.
INFO 08-06 02:41:28 model_runner.py:145] Loading model weights took 12.4036 GB
INFO 08-06 02:41:29 gpu_executor.py:83] # GPU blocks: 2816, # CPU blocks: 512
INFO 08-06 02:41:34 model_runner.py:824] Capturing the model for CUDA graphs. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI.
INFO 08-06 02:41:34 model_runner.py:828] CUDA graphs can take additional 1~3 GiB memory per GPU. If you are running out of memory, consider decreasing `gpu_memory_utilization` or enforcing eager mode. You can also reduce the `max_num_seqs` as needed to decrease memory usage.
INFO 08-06 02:41:55 model_runner.py:894] Graph capturing finished in 21 secs.
100%|█████████████████████████████████████████████████████████████████████████████████████████████| 2376/2376 [00:01<00:00, 1403.64it/s]
Running loglikelihood requests:   0%|                                                                          | 0/9501 [00:00<?, ?it/s][rank0]: Traceback (most recent call last):
[rank0]:   File "/usr/local/bin/lm_eval", line 8, in <module>
[rank0]:     sys.exit(cli_evaluate())
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/lm_eval/__main__.py", line 375, in cli_evaluate
[rank0]:     results = evaluator.simple_evaluate(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/lm_eval/utils.py", line 395, in _wrapper
[rank0]:     return fn(*args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/lm_eval/evaluator.py", line 277, in simple_evaluate
[rank0]:     results = evaluate(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/lm_eval/utils.py", line 395, in _wrapper
[rank0]:     return fn(*args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/lm_eval/evaluator.py", line 449, in evaluate
[rank0]:     resps = getattr(lm, reqtype)(cloned_reqs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/lm_eval/api/model.py", line 371, in loglikelihood
[rank0]:     return self._loglikelihood_tokens(new_reqs, disable_tqdm=disable_tqdm)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/lm_eval/models/vllm_causallms.py", line 448, in _loglikelihood_tokens
[rank0]:     answer = self._parse_logprobs(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/lm_eval/models/vllm_causallms.py", line 493, in _parse_logprobs
[rank0]:     continuation_logprobs_dicts = [
[rank0]: TypeError: 'NoneType' object is not iterable
Running loglikelihood requests:   0%|                                                                          | 0/9501 [00:00<?, ?it/s]
root@145206f3e691:/mnt/self-define/sunning/lmdeploy/vllm_test#

it is the running output. I donot know why.

yanchenmochen commented 2 months ago

@haileyschoelkopf This problem has failed me somedays, Do you know

                answer = self._parse_logprobs(
                    tokens=inp,
                    outputs=output,
                    ctxlen=ctxlen,
                )

    def _parse_logprobs(tokens: List, outputs, ctxlen: int) -> Tuple[float, bool]:
        """Process logprobs and tokens.

        :param tokens: list
            Input tokens (potentially left-truncated)
        :param outputs: RequestOutput
            Contains prompt_logprobs
        :param ctxlen: int
            Length of context (so we can slice them away and only keep the predictions)
        :return:
            continuation_logprobs: float
                Log probabilities of continuation tokens
            is_greedy: bool
                Whether argmax matches given continuation exactly
        """

        # The first entry of prompt_logprobs is None because the model has no previous tokens to condition on.
        continuation_logprobs_dicts = outputs.prompt_logprobs

outputs.prompt_logprobs represents what meaning？

yanchenmochen commented 2 months ago

ns/ms-python.debugpy-2024.0.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher 32781 -- /usr/local/bin/lm-eval --model vllm --model_args pretrained=/mnt/self-define/zhangweixing/model/llama2-7b-hf,gpu_memory_utilization=0.8 --tasks arc_easy --device cuda:0 
INFO 08-06 09:29:33 llm_engine.py:103] Initializing an LLM engine (v0.4.2) with config: model='/mnt/self-define/zhangweixing/model/llama2-7b-hf', speculative_config=None, tokenizer='/mnt/self-define/zhangweixing/model/llama2-7b-hf', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=4096, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=1234, served_model_name=/mnt/self-define/zhangweixing/model/llama2-7b-hf)
INFO 08-06 09:29:33 selector.py:37] Using FlashAttention-2 backend.
INFO 08-06 09:29:41 model_runner.py:145] Loading model weights took 12.5523 GB
INFO 08-06 09:29:42 gpu_executor.py:83] # GPU blocks: 2321, # CPU blocks: 512
INFO 08-06 09:29:44 model_runner.py:824] Capturing the model for CUDA graphs. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI.
INFO 08-06 09:29:44 model_runner.py:828] CUDA graphs can take additional 1~3 GiB memory per GPU. If you are running out of memory, consider decreasing `gpu_memory_utilization` or enforcing eager mode. You can also reduce the `max_num_seqs` as needed to decrease memory usage.
INFO 08-06 09:29:51 model_runner.py:894] Graph capturing finished in 8 secs.
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2376/2376 [00:07<00:00, 327.01it/s]
Running loglikelihood requests:   0%|                                                                                                         | 0/9501 [00:00<?, ?it/s]

在运行lolikelihood requests，outputs.prompt_logprobs is None, I tried the LLama 2 7B， encounter same question

yanchenmochen commented 2 months ago

vllm version 0.4.2. I tried at another environment which is not container env, but also encounter this problem

EleutherAI / lm-evaluation-harness

when executes the OPT 6.7B model evaluation, the problem TypeError: 'NoneType' object is not iterable occur #2177