NVIDIA / FasterTransformer

Transformer related optimization, including BERT, GPT
Apache License 2.0
5.82k stars 889 forks source link

nemo_ckpt_convert.py prompt table conversion when TP is not 1 #370

Open grzywada opened 1 year ago

grzywada commented 1 year ago

Branch/Tag/Commit

main

Docker Image Version

nvcr.io/ea-bignlp/bignlp-training:22.08.01-py3

GPU name

V100

CUDA Driver

470.141.03

Reproduced Steps

In nemo_ckpt_convert.py line 644 model_weights_ckpt = "model_weights.ckpt" and then line 651 os.path.join(args.prompt_saved_dir, model_weights_ckpt), you make an assumption that prompt tunning was carried out with TP = 1.

If TP > 1 then the folder structure is differnt. For every of the ranks a folder is created (e.g. mp_rank_00  mp_rank_01 in case of TP=2). In that case the ckpt files are not in the folder listed above but in the mp_rank_xx folders. They are the same so just mp_rank_00  could be used.

I have not tested for PP > 1 or for PP > and TP > 1 but I would imagine this will also be the case.
byshiue commented 1 year ago

The folder issue should be handled in unpack_nemo_ckpt by NeMo.

grzywada commented 1 year ago

The above is not related to unpack_nemo_ckpt which is working just fine. What I was trying to say is that unpack_nemo_ckpt creates differnt folder structure when the prompt_table is created by a job that has Tensor Paralelism of 1 (in this case the "model_weights.ckpt" is in the folder the code is pointing at) and it is creating a different folder structure if Tensor Paralelism is greater than 1 (in that case "model_weights.ckpt" is not where the code is looking, instead there are folders called mp_rank_00 to mp_rank_xx each with a seperate "model_weights.ckpt" but all containing the same prompt_table).

You couple lines of conditional logic to check TP and:

I think the same will apply if there is PP > 1