I have converted the nemo-megatron-mt5-3B to a binary file successfully by
python3 FasterTransformer/examples/pytorch/t5/utils/nemo_t5_ckpt_convert.py -i nemo-megatron-mt5-3B/nemo_megatron_mt5_3b_bf16_tp2.nemo -o ./models/nemo-megatron-mt5-3B/ -m mt5-3B -i_g 2
When run a tritonserver with CUDA_VISIBLE_DEVICES="0,1" /opt/tritonserver/bin/tritonserver --model-store=fastertransformer_backend/all_models/nemo-megatron-mt5-3B/, tritonserver failed to loading the model with unmatched shape.
I0414 15:43:13.619001 934 libfastertransformer.cc:438] Before Loading Weights:
after allocation : free: 14.14 GB, total: 44.56 GB, used: 30.43 GB
[FT][WARNING] file ./models/nemo-megatron-mt5-3B/2-gpu//decoder.final_layer_norm.bias.bin only has 4096, but request 8192, loading model fails!
[FT][WARNING] file ./models/nemo-megatron-mt5-3B/2-gpu//shared.bias.bin only has 500224, but request 1000448, loading model fails!
[FT][WARNING] file ./models/nemo-megatron-mt5-3B/2-gpu//decoder.final_layer_norm.bias.bin only has 4096, but request 8192, loading model fails!
[FT][WARNING] file ./models/nemo-megatron-mt5-3B/2-gpu//shared.bias.bin only has 500224, but request 1000448, loading model fails!
I0414 15:43:21.362566 934 libfastertransformer.cc:448] After Loading Weights:
@yaoyu-33 @JimmyZhang12 @dimapihtar @Davood-M Hi, could you please give some advice for this issue?
nemo_megatron_mt5_3b_bf16_tp2.nemo
(https://huggingface.co/nvidia/nemo-megatron-mt5-3B) model was trained with--tensor_model_parallel_size=2.
I have converted the
nemo-megatron-mt5-3B
to a binary file successfully bypython3 FasterTransformer/examples/pytorch/t5/utils/nemo_t5_ckpt_convert.py -i nemo-megatron-mt5-3B/nemo_megatron_mt5_3b_bf16_tp2.nemo -o ./models/nemo-megatron-mt5-3B/ -m mt5-3B -i_g 2
When run a tritonserver with
CUDA_VISIBLE_DEVICES="0,1" /opt/tritonserver/bin/tritonserver --model-store=fastertransformer_backend/all_models/nemo-megatron-mt5-3B/
, tritonserver failed to loading the model with unmatched shape.