Open DefTruth opened 1 week ago
发现是一开始有结果返回的类型是list不是Tensor,导致结果取索引错误
<class 'list'>
input_lengths[0]: 343
<class 'list'>
TypeError: list indices must be integers or slices, not tuple
0%| | 0/57 [00:00<?, ?it/s]
0%| | 0/57 [00:00<?, ?it/s]
input_lengths[0]: 343
<class 'list'>
0%| | 0/57 [00:00<?, ?it/s]
input_lengths[0]: 343
<class 'list'>
# 然后报错 ....
# 当结果是Tensor时,不会报错
input_lengths[0]: 343
<class 'torch.Tensor'>
input_lengths[0]: 358
<class 'torch.Tensor'>
input_lengths[0]: 362
<class 'torch.Tensor'>
input_lengths[0]: 372
<class 'torch.Tensor'>
input_lengths[0]: 385
<class 'torch.Tensor'>
input_lengths[0]: 385
<class 'torch.Tensor'>
input_lengths[0]: 373
<class 'torch.Tensor'>
有时候trtllm会返回空列表
input_lengths[0]: 577
<class 'list'>
[]
多卡情况,非rank=0,ModelRunnerCpp直接返回[],导致了这个错误
# If we are in a multi-gpu scenario, only rank 0 continues
if not self.session.can_enqueue_requests():
return []
改成用ModelRunner跑不会报这个错,但是慢很多
另外,跑HF MMLU的时候(--test_hf),有新的报错:
Traceback (most recent call last):
File "/app/tensorrt_llm/examples/mmlu.py", line 427, in <module>
main()
File "/app/tensorrt_llm/examples/mmlu.py", line 382, in main
torch_dtype=DTYPE_STR_MAPPING[args.data_type],
AttributeError: 'Namespace' object has no attribute 'data_type'. Did you mean: 'hf_data_type'?
需要修改成args.hf_data_type
This is a bug.
This is a bug.
any plan to fix it?
System Info
L20x8
Who can help?
@byshiue
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Expected behavior
no error
actual behavior
additional notes
[TensorRT-LLM] TensorRT-LLM version: 0.11.0.dev2024061100