NVIDIA / NeMo-Framework-Launcher

Provides end-to-end model development pipelines for LLMs and Multimodal models that can be launched on-prem or cloud-native.
Apache License 2.0
475 stars 140 forks source link

Convert nemo-megatron-mt5-3B to a binary file of fastertransformer successfully, but tritonserver fails with undesired shape when loading models. #21

Closed songkq closed 1 year ago

songkq commented 1 year ago

@yaoyu-33 @JimmyZhang12 @dimapihtar @Davood-M Hi, could you please give some advice for this issue?

nemo_megatron_mt5_3b_bf16_tp2.nemo (https://huggingface.co/nvidia/nemo-megatron-mt5-3B) model was trained with --tensor_model_parallel_size=2.

I have converted the nemo-megatron-mt5-3B to a binary file successfully by python3 FasterTransformer/examples/pytorch/t5/utils/nemo_t5_ckpt_convert.py -i nemo-megatron-mt5-3B/nemo_megatron_mt5_3b_bf16_tp2.nemo -o ./models/nemo-megatron-mt5-3B/ -m mt5-3B -i_g 2

When run a tritonserver with CUDA_VISIBLE_DEVICES="0,1" /opt/tritonserver/bin/tritonserver --model-store=fastertransformer_backend/all_models/nemo-megatron-mt5-3B/, tritonserver failed to loading the model with unmatched shape.

I0414 15:43:13.619001 934 libfastertransformer.cc:438] Before Loading Weights:
after allocation    : free: 14.14 GB, total: 44.56 GB, used: 30.43 GB
[FT][WARNING] file ./models/nemo-megatron-mt5-3B/2-gpu//decoder.final_layer_norm.bias.bin only has 4096, but request 8192, loading model fails!

[FT][WARNING] file ./models/nemo-megatron-mt5-3B/2-gpu//shared.bias.bin only has 500224, but request 1000448, loading model fails!

[FT][WARNING] file ./models/nemo-megatron-mt5-3B/2-gpu//decoder.final_layer_norm.bias.bin only has 4096, but request 8192, loading model fails!

[FT][WARNING] file ./models/nemo-megatron-mt5-3B/2-gpu//shared.bias.bin only has 500224, but request 1000448, loading model fails!

I0414 15:43:21.362566 934 libfastertransformer.cc:448] After Loading Weights:
songkq commented 1 year ago

Hi, here are my config.pbtxt and config.ini. config.zip

songkq commented 1 year ago

Solve the problem with https://github.com/NVIDIA/FasterTransformer/issues/561.