Open grzywada opened 1 year ago
The folder issue should be handled in unpack_nemo_ckpt
by NeMo.
The above is not related to unpack_nemo_ckpt which is working just fine. What I was trying to say is that unpack_nemo_ckpt creates differnt folder structure when the prompt_table is created by a job that has Tensor Paralelism of 1 (in this case the "model_weights.ckpt" is in the folder the code is pointing at) and it is creating a different folder structure if Tensor Paralelism is greater than 1 (in that case "model_weights.ckpt" is not where the code is looking, instead there are folders called mp_rank_00 to mp_rank_xx each with a seperate "model_weights.ckpt" but all containing the same prompt_table).
You couple lines of conditional logic to check TP and:
I think the same will apply if there is PP > 1
Branch/Tag/Commit
main
Docker Image Version
nvcr.io/ea-bignlp/bignlp-training:22.08.01-py3
GPU name
V100
CUDA Driver
470.141.03
Reproduced Steps