NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.19k stars 908 forks source link

v0.6.1 model build error #562

Open sleepwalker2017 opened 9 months ago

sleepwalker2017 commented 9 months ago

GPU. 2*V100

convert command

python build.py --model_dir /data/models/vicuna-13b-v1.5/vicuna-13b-v1.5/ \
                --dtype float16 \
                --use_gpt_attention_plugin float16 \
                --use_gemm_plugin float16 \
                --output_dir ./tmp/llama/13B/trt_engines/fp16/2-gpu/ \
                --max_batch_size 16 \
                --tp_size 2 \
                --world_size 2 --parallel_build \
                --use_inflight_batching \
                --remove_input_padding \
                --paged_kv_cache \
[12/05/2023-07:06:48] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +12569, now: CPU 0, GPU 12569 (MiB)
[12/05/2023-07:06:48] [TRT-LLM] [I] Activation memory size: 10032.13 MiB
[12/05/2023-07:06:48] [TRT-LLM] [I] Weights memory size: 12575.04 MiB
[12/05/2023-07:06:48] [TRT-LLM] [I] Max KV Cache memory size: 16000.00 MiB
[12/05/2023-07:06:48] [TRT-LLM] [I] Estimated max memory usage on runtime: 38607.17 MiB
[12/05/2023-07:06:48] [TRT] [I] Loaded engine size: 12575 MiB
Traceback (most recent call last):
  File "/data/TRT-LLM-0.6/examples/llama/build.py", line 778, in <module>
    mp.spawn(build, nprocs=args.world_size, args=(args, ))
  File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 246, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method="spawn")
  File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 202, in start_processes
    while not context.join():
  File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 163, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 1 terminated with the following error:
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 74, in _wrap
    fn(i, *args)
  File "/data/TRT-LLM-0.6/examples/llama/build.py", line 737, in build
    profiler.check_gpt_mem_usage(
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/builder.py", line 48, in decorated
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/profiler.py", line 312, in check_gpt_mem_usage
    _, _, total_mem = device_memory_info(torch.cuda.current_device())
TypeError: cannot unpack non-iterable NoneType object

/usr/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
jaedeok-nvidia commented 9 months ago

Hi @sleepwalker2017, it's fixed in the main branch (PR #465) but v0.6.1 doesn't include it.

sleepwalker2017 commented 9 months ago

Hi @sleepwalker2017, it's fixed in the main branch (PR #465) but v0.6.1 doesn't include it.

Please install pynvml>=11.5.0 and psutil in order to avoid the issue. Thanks,

Still has some problem, this bug

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 74, in _wrap
    fn(i, *args)
  File "/data/weilong.yu/TRT-LLM-0.6/examples/llama/build.py", line 737, in build
    profiler.check_gpt_mem_usage(
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/builder.py", line 48, in decorated
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/profiler.py", line 314, in check_gpt_mem_usage
    logger.warning(
TypeError: Logger.warning() takes 2 positional arguments but 3 were given
HuChundong commented 9 months ago

0.6.1 rel branch

Traceback (most recent call last): File "D:\AI\TensorRT-LLM\examples\chatglm\build.py", line 775, in <module> run_build() File "D:\AI\TensorRT-LLM\examples\chatglm\build.py", line 767, in run_build build(0, args) File "D:\AI\TensorRT-LLM\examples\chatglm\build.py", line 723, in build check_gpt_mem_usage( File "C:\Users\hucd\.conda\envs\trllm\lib\site-packages\tensorrt_llm\builder.py", line 48, in decorated return f(*args, **kwargs) File "C:\Users\hucd\.conda\envs\trllm\lib\site-packages\tensorrt_llm\profiler.py", line 312, in check_gpt_mem_usage _, _, total_mem = device_memory_info(torch.cuda.current_device()) TypeError: cannot unpack non-iterable NoneType object

isRambler commented 8 months ago

So how to solve this problem