Closed Caozhou1995 closed 3 weeks ago
This PR adds mlp ckpt conversation from hf to megatron in the llava pretrain phase. An tp2 example of the ckpt conversation for the LLaVA Pretrain phase is shown below:
CLIP cd FlagScale/megatron && python examples/multimodal/clip_converter.py --download-root <clip_hf_dir> --output <clip_megatron_dir> --tensor-parallel-size 2 --use-te-layernorm-linear
cd FlagScale/megatron && python examples/multimodal/clip_converter.py --download-root <clip_hf_dir> --output <clip_megatron_dir> --tensor-parallel-size 2 --use-te-layernorm-linear
Vicuna cd FlagScale/megatron && python tools/checkpoint/convert.py --model-type GPT --loader llama_mistral --saver mcore --target-tensor-parallel-size 2 --checkpoint-type hf --load-dir <vicuna_hf_dir> --save-dir <vicuna_megatron_dir> --tokenizer-model <vicuna_hf_dir>/tokenizer.model --bf16 --model-size llama2-7Bf --true-vocab-size 32256 --megatron-path <flagscale_dir>
cd FlagScale/megatron && python tools/checkpoint/convert.py --model-type GPT --loader llama_mistral --saver mcore --target-tensor-parallel-size 2 --checkpoint-type hf --load-dir <vicuna_hf_dir> --save-dir <vicuna_megatron_dir> --tokenizer-model <vicuna_hf_dir>/tokenizer.model --bf16 --model-size llama2-7Bf --true-vocab-size 32256 --megatron-path <flagscale_dir>
MLP cd FlagScale/megatron && python examples/multimodal/mlp_converter.py --input <mlp_hf_dir>/mm_projector.bin --output <mlp_megatron_dir> --tensor-parallel-size 2
cd FlagScale/megatron && python examples/multimodal/mlp_converter.py --input <mlp_hf_dir>/mm_projector.bin --output <mlp_megatron_dir> --tensor-parallel-size 2
And combine them: cd FlagScale/megatron && PYTHONPATH=FlagScale/megatron:FlagScale/ python examples/multimodal/combine_state_dicts.py --input <vicuna_megatron_dir>/iter_0000001/mp_rank_00/model_optim_rng.pt <clip_megatron_dir>/state_dict_tp_0.pt <mlp_megatron_dir>/state_dict_tp_0.pt <vicuna_megatron_dir>/iter_0000001/mp_rank_01/model_optim_rng.pt <clip_megatron_dir>/state_dict_tp_1.pt <mlp_megatron_dir>/state_dict_tp_1.pt --prefixes language_model vision_model vision_projection language_model vision_model vision_projection --output <output_dir>/iter_0000001/mp_rank_00/model_optim_rng.pt <output_dir>/iter_0000001/mp_rank_01/model_optim_rng.pt && cd <output_idr> && echo "1" > latest_checkpointed_iteration.txt
cd FlagScale/megatron && PYTHONPATH=FlagScale/megatron:FlagScale/ python examples/multimodal/combine_state_dicts.py --input <vicuna_megatron_dir>/iter_0000001/mp_rank_00/model_optim_rng.pt <clip_megatron_dir>/state_dict_tp_0.pt <mlp_megatron_dir>/state_dict_tp_0.pt <vicuna_megatron_dir>/iter_0000001/mp_rank_01/model_optim_rng.pt <clip_megatron_dir>/state_dict_tp_1.pt <mlp_megatron_dir>/state_dict_tp_1.pt --prefixes language_model vision_model vision_projection language_model vision_model vision_projection --output <output_dir>/iter_0000001/mp_rank_00/model_optim_rng.pt <output_dir>/iter_0000001/mp_rank_01/model_optim_rng.pt && cd <output_idr> && echo "1" > latest_checkpointed_iteration.txt
This PR also updates three models config.
This PR adds mlp ckpt conversation from hf to megatron in the llava pretrain phase. An tp2 example of the ckpt conversation for the LLaVA Pretrain phase is shown below:
CLIP
cd FlagScale/megatron && python examples/multimodal/clip_converter.py --download-root <clip_hf_dir> --output <clip_megatron_dir> --tensor-parallel-size 2 --use-te-layernorm-linear
Vicuna
cd FlagScale/megatron && python tools/checkpoint/convert.py --model-type GPT --loader llama_mistral --saver mcore --target-tensor-parallel-size 2 --checkpoint-type hf --load-dir <vicuna_hf_dir> --save-dir <vicuna_megatron_dir> --tokenizer-model <vicuna_hf_dir>/tokenizer.model --bf16 --model-size llama2-7Bf --true-vocab-size 32256 --megatron-path <flagscale_dir>
MLP
cd FlagScale/megatron && python examples/multimodal/mlp_converter.py --input <mlp_hf_dir>/mm_projector.bin --output <mlp_megatron_dir> --tensor-parallel-size 2
And combine them:
cd FlagScale/megatron && PYTHONPATH=FlagScale/megatron:FlagScale/ python examples/multimodal/combine_state_dicts.py --input <vicuna_megatron_dir>/iter_0000001/mp_rank_00/model_optim_rng.pt <clip_megatron_dir>/state_dict_tp_0.pt <mlp_megatron_dir>/state_dict_tp_0.pt <vicuna_megatron_dir>/iter_0000001/mp_rank_01/model_optim_rng.pt <clip_megatron_dir>/state_dict_tp_1.pt <mlp_megatron_dir>/state_dict_tp_1.pt --prefixes language_model vision_model vision_projection language_model vision_model vision_projection --output <output_dir>/iter_0000001/mp_rank_00/model_optim_rng.pt <output_dir>/iter_0000001/mp_rank_01/model_optim_rng.pt && cd <output_idr> && echo "1" > latest_checkpointed_iteration.txt
This PR also updates three models config.