Lightning-AI / lightning-thunder

Make PyTorch models up to 40% faster! Thunder is a source to source compiler for PyTorch. It enables using different hardware executors at once; across one or thousands of GPUs.
Apache License 2.0
1.15k stars 77 forks source link

Support NeMo NeVA Model #343

Open athitten opened 5 months ago

athitten commented 5 months ago

🚀 Feature

NeMo's NeVa (LLaVa) is a multimodal language model

Initial examine: Found 49 distinct operations, of which 39 (79.6%) are supported

Work items

Running the model

Required data

First download the freely available data and place it in a data directory.

NeMo installation

Dependencies
python3 -m pip install --no-deps \
  huggingface-hub==0.23.2
NeMo branch

To keep the whole thunder team on the same NeMo revisions, and to prevent having a bunch of "modify file to call thunder.jit()" instructions, we temporarily maintain our own branch for thunder. You can grab it by cloning https://github.com/tfogal/NeMo.git. Make sure you have checked out the tfogal/thunder-nemo branch.

To install NeMo, run python3 -m pip install -e . from the root of the checked-out directory.

Running the network

rm -fr foo-neva-train; mkdir -p foo-neva-train
HYDRA_FULL_ERROR=1 \
THUNDER_ANNOTATE_TRACES=1 \
NEMO_THUNDER_NEVA=thunder \
python3 ./examples/multimodal/multimodal_llm/neva/neva_pretrain.py \
    trainer.precision=bf16-mixed \
    model.megatron_amp_O2=True \
    model.mcore_gpt=False \
    trainer.num_nodes=1 \
    trainer.devices=1 \
    trainer.val_check_interval=10 \
    trainer.limit_val_batches=5 \
    trainer.log_every_n_steps=1 \
    ++exp_manager.max_time_per_run=00:00:03:00 \
    trainer.max_steps=20 \
    model.micro_batch_size=2 \
    model.global_batch_size=4 \
    model.tensor_model_parallel_size=1 \
    model.pipeline_model_parallel_size=1 \
    exp_manager.create_checkpoint_callback=False \
    model.data.data_path=./data/multimodal/tiny-neva/dummy.json \
    model.data.image_folder=./data/multimodal/tiny-neva/images \
    model.tokenizer.library=sentencepiece \
    model.tokenizer.model=./data/multimodal/tiny-neva/tokenizer_add_special.model \
    model.num_layers=2 \
    model.hidden_size=5120 \
    model.ffn_hidden_size=13824 \
    model.num_attention_heads=40 \
    model.normalization=rmsnorm \
    model.data.num_workers=0 \
    model.data.conv_template=llama_2 \
    model.mm_cfg.vision_encoder.from_pretrained=openai/clip-vit-large-patch14 \
    model.mm_cfg.llm.from_pretrained=null \
    model.use_flash_attention=false \
    exp_manager.exp_dir=./foo-neva-train

Note that the latest version of the tfogal/thunder-nemo branch allows running with dynamo+thunder by setting NEMO_THUNDER_NEVA=dynamo.

cc @apaz-cli @tfogal

IvanYashchuk commented 2 months ago

Can you share the script for the examine call?

tfogal commented 2 months ago

Can you share the script for the examine call?

@athitten when you have a minute

athitten commented 1 month ago

Adding the updated command to use megatron_amp_O2=True and model.mcore_gpt = True (NeMo models will be defaulting to using models from Megatron, hence this setting). With megatron_amp_O2=True, having precision=bf16 should do mixed precision training with main copy of weights in FP32, but just to be safe also specifying precision=bf16-mixed.

python3 ./examples/multimodal/multimodal_llm/neva/neva_pretrain.py trainer.precision=bf16-mixed model.megatron_amp_O2=True model.mcore_gpt=True  trainer.num_nodes=1 trainer.devices=1 trainer.val_check_interval=10 trainer.limit_val_batches=5 trainer.log_every_n_steps=1 ++exp_manager.max_time_per_run=00:00:03:00 trainer.max_steps=20 model.micro_batch_size=2 model.global_batch_size=4 model.tensor_model_parallel_size=1 model.pipeline_model_parallel_size=1 exp_manager.create_checkpoint_callback=False model.data.data_path=./data/multimodal/tiny-neva/dummy.json model.data.image_folder=./data/multimodal/tiny-neva/images model.tokenizer.library=sentencepiece model.tokenizer.model=./data/multimodal/tiny-neva/tokenizer_add_special.model model.num_layers=2 model.hidden_size=5120 model.ffn_hidden_size=13824 model.num_attention_heads=40 model.normalization=rmsnorm model.data.num_workers=0 model.data.conv_template=llama_2 model.mm_cfg.vision_encoder.from_pretrained=openai/clip-vit-large-patch14 model.mm_cfg.llm.from_pretrained=null model.use_flash_attention=false exp_manager.exp_dir=./foo-neva-train
athitten commented 1 month ago

This might be helpful: The full config with default values for all parameters can be found: here. Only the parameters we specify in the run command get overwritten by the specified values and others default to values mentioned in the config.

tfogal commented 1 month ago

Adding the updated command

Thanks, @athitten ! I have edited the original issue to mostly reflect the updated command. Unfortunately #753 blocks setting model.mcore_gpt=True, so for now that one's still False... but let's prioritize that one!

athitten commented 1 month ago

Yes its important to prioritize getting thunder working with mcore_gpt=True as it will be default for NeMo models once we deprecate the legacy path.