Loaded model not correctly sent to the process (GPTNeoX build.py)

I am getting an error when using TensorRT-LLM/examples/gptneox/build.py to build the TensorRT engine:

line 314, in build_rank_engine
    assert hf_gpt is not None, f'Could not load weights from hf_gpt model as it is not loaded yet.'
AssertionError: Could not load weights from hf_gpt model as it is not loaded yet.

It seems like hf_gpt is correctly loaded in parse_arguments. However, inside a process (inside the build function), hf_gpt is None.

I am using TensorRT-LLM 0.7.0, and this is the command I am using to build:

python3 build.py \
                        --log_level verbose \
                        --world_size 4 \
                        --model_dir /tmp/input_model_dir/ \
                        --dtype float16 \
                        --max_input_len 1024 \
                        --max_output_len 512 \
                        --max_batch_size 32 \
                        --max_beam_width 1 \
                        --use_gpt_attention_plugin float16 \
                        --use_gemm_plugin float16 \
                        --use_layernorm_plugin float16 \
                        --enable_context_fmha \
                        --remove_input_padding \
                        --output_dir /tmp/output_model_dir/ \
                        --parallel_build

NVIDIA / TensorRT-LLM

Loaded model not correctly sent to the process (GPTNeoX build.py) #795