[Model] Add llava mlp converter and update three models config

This PR adds mlp ckpt conversation from hf to megatron in the llava pretrain phase. An tp2 example of the ckpt conversation for the LLaVA Pretrain phase is shown below:

CLIP cd FlagScale/megatron && python examples/multimodal/clip_converter.py --download-root <clip_hf_dir> --output <clip_megatron_dir> --tensor-parallel-size 2 --use-te-layernorm-linear
Vicuna cd FlagScale/megatron && python tools/checkpoint/convert.py --model-type GPT --loader llama_mistral --saver mcore --target-tensor-parallel-size 2 --checkpoint-type hf --load-dir <vicuna_hf_dir> --save-dir <vicuna_megatron_dir> --tokenizer-model <vicuna_hf_dir>/tokenizer.model --bf16 --model-size llama2-7Bf --true-vocab-size 32256 --megatron-path <flagscale_dir>
MLP cd FlagScale/megatron && python examples/multimodal/mlp_converter.py --input <mlp_hf_dir>/mm_projector.bin --output <mlp_megatron_dir> --tensor-parallel-size 2

And combine them: cd FlagScale/megatron && PYTHONPATH=FlagScale/megatron:FlagScale/ python examples/multimodal/combine_state_dicts.py --input <vicuna_megatron_dir>/iter_0000001/mp_rank_00/model_optim_rng.pt <clip_megatron_dir>/state_dict_tp_0.pt <mlp_megatron_dir>/state_dict_tp_0.pt <vicuna_megatron_dir>/iter_0000001/mp_rank_01/model_optim_rng.pt <clip_megatron_dir>/state_dict_tp_1.pt <mlp_megatron_dir>/state_dict_tp_1.pt --prefixes language_model vision_model vision_projection language_model vision_model vision_projection --output <output_dir>/iter_0000001/mp_rank_00/model_optim_rng.pt <output_dir>/iter_0000001/mp_rank_01/model_optim_rng.pt && cd <output_idr> && echo "1" > latest_checkpointed_iteration.txt

This PR also updates three models config.

FlagOpen / FlagScale

[Model] Add llava mlp converter and update three models config #197