MMLU script raise TypeError: list indices must be integers or slices, not tuple

DefTruth commented 1 week ago

System Info

L20x8

Who can help?

@byshiue

Information

[ ] The official example scripts
[ ] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

cd /app/tensorrt_llm/examples
mkdir data; wget https://people.eecs.berkeley.edu/~hendrycks/data.tar -O data/mmlu.tar
tar -xf data/mmlu.tar -C data && mv data/data data/mmlu

mpirun --allow-run-as-root -n 8 python3 mmlu.py \
                --hf_model_dir $HF_MODELS/Qwen1.5-72B-Chat \
                --engine_dir $HF_MODELS/engine/Qwen1.5-72B-Chat/fp16/8-gpu/ \
                --data_dir "./data/mmlu" --test_trt_llm

mpirun --allow-run-as-root -n 8 python3 mmlu.py \
                --hf_model_dir $HF_MODELS/Qwen1.5-72B-Chat \
                --engine_dir $HF_MODELS/engine/Qwen1.5-72B-Chat/fp16/8-gpu/ \
                --data_dir "./data/mmlu" --test_hf

Expected behavior

no error

actual behavior

Traceback (most recent call last):
  File "/app/tensorrt_llm/examples/mmlu.py", line 427, in <module>
    main()
  File "/app/tensorrt_llm/examples/mmlu.py", line 402, in main
    cors, acc, probs = evaluate(args, subject, pipeline, dev_df, test_df)
  File "/app/tensorrt_llm/examples/mmlu.py", line 214, in evaluate
    pred = pipeline(prompt)
  File "/app/tensorrt_llm/examples/mmlu.py", line 299, in __call__
    output_ids = outputs[0, 0, input_lengths[0]:]
TypeError: list indices must be integers or slices, not tuple
  0%|          | 0/57 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/app/tensorrt_llm/examples/mmlu.py", line 427, in <module>
    main()
  File "/app/tensorrt_llm/examples/mmlu.py", line 402, in main
    cors, acc, probs = evaluate(args, subject, pipeline, dev_df, test_df)
  File "/app/tensorrt_llm/examples/mmlu.py", line 214, in evaluate
    pred = pipeline(prompt)
  File "/app/tensorrt_llm/examples/mmlu.py", line 299, in __call__
    output_ids = outputs[0, 0, input_lengths[0]:]
TypeError: list indices must be integers or slices, not tuple

additional notes

[TensorRT-LLM] TensorRT-LLM version: 0.11.0.dev2024061100

DefTruth commented 1 week ago

发现是一开始有结果返回的类型是list不是Tensor，导致结果取索引错误

<class 'list'>
input_lengths[0]: 343
<class 'list'>
TypeError: list indices must be integers or slices, not tuple
  0%|          | 0/57 [00:00<?, ?it/s]
  0%|          | 0/57 [00:00<?, ?it/s]
input_lengths[0]: 343
<class 'list'>
  0%|          | 0/57 [00:00<?, ?it/s]
input_lengths[0]: 343
<class 'list'>
# 然后报错 ....

# 当结果是Tensor时，不会报错
input_lengths[0]: 343
<class 'torch.Tensor'>
input_lengths[0]: 358
<class 'torch.Tensor'>
input_lengths[0]: 362
<class 'torch.Tensor'>
input_lengths[0]: 372
<class 'torch.Tensor'>
input_lengths[0]: 385
<class 'torch.Tensor'>
input_lengths[0]: 385
<class 'torch.Tensor'>
input_lengths[0]: 373
<class 'torch.Tensor'>

有时候trtllm会返回空列表

input_lengths[0]: 577
<class 'list'>
[]

多卡情况，非rank=0，ModelRunnerCpp直接返回[]，导致了这个错误

 # If we are in a multi-gpu scenario, only rank 0 continues
        if not self.session.can_enqueue_requests():
            return []

DefTruth commented 1 week ago

改成用ModelRunner跑不会报这个错，但是慢很多

DefTruth commented 6 days ago

另外，跑HF MMLU的时候（--test_hf），有新的报错：

Traceback (most recent call last):
  File "/app/tensorrt_llm/examples/mmlu.py", line 427, in <module>
    main()
  File "/app/tensorrt_llm/examples/mmlu.py", line 382, in main
    torch_dtype=DTYPE_STR_MAPPING[args.data_type],
AttributeError: 'Namespace' object has no attribute 'data_type'. Did you mean: 'hf_data_type'?

需要修改成args.hf_data_type

hijkzzz commented 6 days ago

This is a bug.

DefTruth commented 5 days ago

This is a bug.

any plan to fix it?

NVIDIA / TensorRT-LLM

MMLU script raise TypeError: list indices must be integers or slices, not tuple #1822

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes