Open limes22 opened 6 months ago
I seem to find the reason! It is because no GPU was found and the number of workers was set to 0 here. Check if your container or host sees GPU by running nvidia-smi
. In my case, the container was not connecting to CUDA resulting in the error. I have restarted the container and the problem was resolved.
GitHub blocked my original account for some reason, so I am reposting it here. Hope it helps!
I seem to find the reason! It is because no GPU was found and the number of workers was set to 0 here. Check if your container or host sees GPU by running
nvidia-smi
. In my case, the container was not connecting to CUDA resulting in the error. I have restarted the container and the problem was resolved.
System Info
trtllm-build --checkpoint_dir ./tllm_checkpoint_1gpu_bf16 \ --output_dir ./tmp/llama/7B/trt_engines/bf16/1-gpu \ --gpt_attention_plugin bfloat16 \ --gemm_plugin bfloat16
An error occurs when the command is entered.
It may not be optimal to set max_num_tokens=max_batch_size*max_input_len when remove_input_padding is enabled, because the number of packed input tokens are very likely to be smaller, we strongly recommend to set max_num_tokens according to your workloads. Traceback (most recent call last): File "/usr/local/bin/trtllm-build", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/commands/build.py", line 514, in main
parallel_build(source, build_config, args.output_dir, workers,
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/commands/build.py", line 441, in parallel_build
with ProcessPoolExecutor(mp_context=get_context('spawn'),
File "/usr/lib/python3.10/concurrent/futures/process.py", line 611, in init
raise ValueError("max_workers must be greater than 0")
ValueError: max_workers must be greater than 0
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
trtllm-build --checkpoint_dir ./tllm_checkpoint_1gpu_bf16 \ --output_dir ./tmp/llama/7B/trt_engines/bf16/1-gpu \ --gpt_attention_plugin bfloat16 \ --gemm_plugin bfloat16
Expected behavior
Build Complete
actual behavior
[TensorRT-LLM] TensorRT-LLM version: 0.9.0.dev2024022000 /usr/local/lib/python3.10/dist-packages/torch/cuda/init.py:626: UserWarning: Can't initialize NVML warnings.warn("Can't initialize NVML") [02/26/2024-09:18:30] [TRT-LLM] [I] Set bert_attention_plugin to float16. [02/26/2024-09:18:30] [TRT-LLM] [I] Set gpt_attention_plugin to bfloat16. [02/26/2024-09:18:30] [TRT-LLM] [I] Set gemm_plugin to bfloat16. [02/26/2024-09:18:30] [TRT-LLM] [I] Set lookup_plugin to None. [02/26/2024-09:18:30] [TRT-LLM] [I] Set lora_plugin to None. [02/26/2024-09:18:30] [TRT-LLM] [I] Set moe_plugin to float16. [02/26/2024-09:18:30] [TRT-LLM] [I] Set context_fmha to True. [02/26/2024-09:18:30] [TRT-LLM] [I] Set context_fmha_fp32_acc to False. [02/26/2024-09:18:30] [TRT-LLM] [I] Set paged_kv_cache to True. [02/26/2024-09:18:30] [TRT-LLM] [I] Set remove_input_padding to True. [02/26/2024-09:18:30] [TRT-LLM] [I] Set use_custom_all_reduce to True. [02/26/2024-09:18:30] [TRT-LLM] [I] Set multi_block_mode to False. [02/26/2024-09:18:30] [TRT-LLM] [I] Set enable_xqa to True. [02/26/2024-09:18:30] [TRT-LLM] [I] Set attention_qk_half_accumulation to False. [02/26/2024-09:18:30] [TRT-LLM] [I] Set tokens_per_block to 128. [02/26/2024-09:18:30] [TRT-LLM] [I] Set use_paged_context_fmha to False. [02/26/2024-09:18:30] [TRT-LLM] [I] Set use_context_fmha_for_generation to False. [02/26/2024-09:18:30] [TRT-LLM] [W] remove_input_padding is enabled, while max_num_tokens is not set, setting to max_batch_sizemax_input_len. It may not be optimal to set max_num_tokens=max_batch_sizemax_input_len when remove_input_padding is enabled, because the number of packed input tokens are very likely to be smaller, we strongly recommend to set max_num_tokens according to your workloads. Traceback (most recent call last): File "/usr/local/bin/trtllm-build", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/commands/build.py", line 514, in main
parallel_build(source, build_config, args.output_dir, workers,
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/commands/build.py", line 441, in parallel_build
with ProcessPoolExecutor(mp_context=get_context('spawn'),
File "/usr/lib/python3.10/concurrent/futures/process.py", line 611, in init
raise ValueError("max_workers must be greater than 0")
ValueError: max_workers must be greater than 0
additional notes
If anyone has solved the problem, please comment.