Lightning-AI / lightning-thunder

Make PyTorch models up to 40% faster! Thunder is a source to source compiler for PyTorch. It enables using different hardware executors at once; across one or thousands of GPUs.
Apache License 2.0
1.13k stars 69 forks source link

NeVA bug regarding `flash_attn_with_kvcache` #1004

Open k223kim opened 3 weeks ago

k223kim commented 3 weeks ago

Note: If you have a model or program that is not supported yet but should be, please use the program coverage template.

šŸ› Bug

To Reproduce

I was trying to run NeVA by following this issue. However, when running the script I faced the following error:

Error executing job with overrides: ['trainer.precision=bf16-mixed', 'model.megatron_amp_O2=True', 'model.mcore_gpt=False', 'trainer.num_nodes=1', 'trainer.devices=1', 'trainer.val_check_interval=10', 'trainer.limit_val_batches=5', 'trainer.log_every_n_steps=1', '++exp_manager.max_time_per_run=00:00:03:00', 'trainer.max_steps=20', 'model.micro_batch_size=2', 'model.global_batch_size=4', 'model.tensor_model_parallel_size=1', 'model.pipeline_model_parallel_size=1', 'exp_manager.create_checkpoint_callback=False', 'model.data.data_path=./data/multimodal/tiny-neva/dummy.json', 'model.data.image_folder=./data/multimodal/tiny-neva/images', 'model.tokenizer.library=sentencepiece', 'model.tokenizer.model=./data/multimodal/tiny-neva/tokenizer_add_special.model', 'model.num_layers=2', 'model.hidden_size=5120', 'model.ffn_hidden_size=13824', 'model.num_attention_heads=40', 'model.normalization=rmsnorm', 'model.data.num_workers=0', 'model.data.conv_template=llama_2', 'model.mm_cfg.vision_encoder.from_pretrained=openai/clip-vit-large-patch14', 'model.mm_cfg.llm.from_pretrained=null', 'model.use_flash_attention=false', 'exp_manager.exp_dir=./foo-neva-train']
[rank0]: Traceback (most recent call last):
[rank0]:   File "/teamspace/studios/this_studio/NeMo/./examples/multimodal/multimodal_llm/neva/neva_pretrain.py", line 119, in <module>
[rank0]:     main()
[rank0]:   File "/teamspace/studios/this_studio/NeMo/nemo/core/config/hydra_runner.py", line 129, in wrapper
[rank0]:     _run_hydra(
[rank0]:   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra
[rank0]:     _run_app(
[rank0]:   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/hydra/_internal/utils.py", line 457, in _run_app
[rank0]:     run_and_report(
[rank0]:   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/hydra/_internal/utils.py", line 223, in run_and_report
[rank0]:     raise ex
[rank0]:   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/hydra/_internal/utils.py", line 220, in run_and_report
[rank0]:     return func()
[rank0]:   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/hydra/_internal/utils.py", line 458, in <lambda>
[rank0]:     lambda: hydra.run(
[rank0]:   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 132, in run
[rank0]:     _ = ret.return_value
[rank0]:   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value
[rank0]:     raise self._return_value
[rank0]:   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/hydra/core/utils.py", line 186, in run_job
[rank0]:     ret.return_value = task_function(task_cfg)
[rank0]:   File "/teamspace/studios/this_studio/NeMo/./examples/multimodal/multimodal_llm/neva/neva_pretrain.py", line 110, in main
[rank0]:     trainer.fit(model)
[rank0]:   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 543, in fit
[rank0]:     call._call_and_handle_interrupt(
[rank0]:   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 43, in _call_and_handle_interrupt
[rank0]:     return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
[rank0]:   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 105, in launch
[rank0]:     return function(*args, **kwargs)
[rank0]:   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 579, in _fit_impl
[rank0]:     self._run(model, ckpt_path=ckpt_path)
[rank0]:   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 986, in _run
[rank0]:     results = self._run_stage()
[rank0]:   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1028, in _run_stage
[rank0]:     self._run_sanity_check()
[rank0]:   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1057, in _run_sanity_check
[rank0]:     val_loop.run()
[rank0]:   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/pytorch_lightning/loops/utilities.py", line 182, in _decorator
[rank0]:     return loop_run(self, *args, **kwargs)
[rank0]:   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 135, in run
[rank0]:     self._evaluation_step(batch, batch_idx, dataloader_idx, dataloader_iter)
[rank0]:   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 396, in _evaluation_step
[rank0]:     output = call._call_strategy_hook(trainer, hook_name, *step_args)
[rank0]:   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 311, in _call_strategy_hook
[rank0]:     output = fn(*args, **kwargs)
[rank0]:   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 411, in validation_step
[rank0]:     return self.lightning_module.validation_step(*args, **kwargs)
[rank0]:   File "/teamspace/studios/this_studio/NeMo/nemo/collections/multimodal/models/multimodal_llm/neva/neva_model.py", line 897, in validation_step
[rank0]:     return MegatronGPTModel.validation_step(self, dataloader_iter)
[rank0]:   File "/teamspace/studios/this_studio/NeMo/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py", line 1370, in validation_step
[rank0]:     loss = self.fwd_bwd_step(dataloader_iter, True, first_val_step)
[rank0]:   File "/teamspace/studios/this_studio/NeMo/nemo/collections/multimodal/models/multimodal_llm/neva/neva_model.py", line 665, in fwd_bwd_step
[rank0]:     return MegatronGPTModel.fwd_bwd_step(self, dataloader_iter, forward_only, first_val_step)
[rank0]:   File "/teamspace/studios/this_studio/NeMo/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py", line 684, in fwd_bwd_step
[rank0]:     losses_reduced_per_micro_batch = fwd_bwd_function(
[rank0]:   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/megatron/core/pipeline_parallel/schedules.py", line 381, in forward_backward_no_pipelining
[rank0]:     output_tensor, num_tokens = forward_step(
[rank0]:   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/megatron/core/pipeline_parallel/schedules.py", line 206, in forward_step
[rank0]:     output_tensor, loss_func = forward_step_func(data_iterator, model)
[rank0]:   File "/teamspace/studios/this_studio/NeMo/nemo/collections/multimodal/models/multimodal_llm/neva/neva_model.py", line 832, in fwd_output_and_loss_func
[rank0]:     output_tensor = model(**forward_args)
[rank0]:   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/teamspace/studios/this_studio/lightning-thunder/thunder/core/module.py", line 64, in forward
[rank0]:     res = self._forward_fn(*args, **kwargs)
[rank0]:   File "/teamspace/studios/this_studio/lightning-thunder/thunder/__init__.py", line 744, in fn_
[rank0]:     cache_entry, inps, pro_to_epi = get_computation_and_inputs(*args, **kwargs)
[rank0]:   File "/teamspace/studios/this_studio/lightning-thunder/thunder/core/langctxs.py", line 136, in _fn
[rank0]:     result = fn(*args, **kwargs)
[rank0]:   File "/teamspace/studios/this_studio/lightning-thunder/thunder/__init__.py", line 229, in cache_info_wrapper
[rank0]:     res = fn(*args, **kwargs)
[rank0]:   File "/teamspace/studios/this_studio/lightning-thunder/thunder/__init__.py", line 536, in get_computation_and_inputs
[rank0]:     jit_results: TraceResults = interpreter(
[rank0]:   File "/teamspace/studios/this_studio/lightning-thunder/thunder/__init__.py", line 217, in _general_frontend
[rank0]:     return thunder_general_jit(fn, args, kwargs, sharp_edges=sharp_edges, record_history=record_history)
[rank0]:   File "/teamspace/studios/this_studio/lightning-thunder/thunder/core/jit_ext.py", line 1794, in thunder_general_jit
[rank0]:     result = jfn(*args, **kwargs)
[rank0]:   File "/teamspace/studios/this_studio/lightning-thunder/thunder/core/interpreter.py", line 7122, in fn_
[rank0]:     raise e
[rank0]:   File "/teamspace/studios/this_studio/lightning-thunder/thunder/core/interpreter.py", line 7090, in fn_2
[rank0]:     return fn(*args, **kwargs)
[rank0]:   File "/teamspace/studios/this_studio/lightning-thunder/thunder/core/interpreter.py", line 6407, in _impl
[rank0]:     return fn.__func__(fn.__self__, *args, **kwargs)
[rank0]:   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/teamspace/studios/this_studio/lightning-thunder/thunder/core/interpreter.py", line 6407, in _impl
[rank0]:     return fn.__func__(fn.__self__, *args, **kwargs)
[rank0]:   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/teamspace/studios/this_studio/lightning-thunder/thunder/core/interpreter.py", line 6407, in _impl
[rank0]:     return fn.__func__(fn.__self__, *args, **kwargs)
[rank0]:   File "/teamspace/studios/this_studio/NeMo/nemo/collections/nlp/modules/common/megatron/module.py", line 292, in forward
[rank0]:     outputs = self.module(*inputs, **kwargs)
[rank0]:   File "/teamspace/studios/this_studio/lightning-thunder/thunder/core/interpreter.py", line 6407, in _impl
[rank0]:     return fn.__func__(fn.__self__, *args, **kwargs)
[rank0]:   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/teamspace/studios/this_studio/lightning-thunder/thunder/core/interpreter.py", line 6407, in _impl
[rank0]:     return fn.__func__(fn.__self__, *args, **kwargs)
[rank0]:   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/teamspace/studios/this_studio/lightning-thunder/thunder/core/interpreter.py", line 6407, in _impl
[rank0]:     return fn.__func__(fn.__self__, *args, **kwargs)
[rank0]:   File "/teamspace/studios/this_studio/NeMo/nemo/collections/multimodal/models/multimodal_llm/neva/neva_model.py", line 470, in forward
[rank0]:     result = GPTModel.forward(self, *args, **kwargs)
[rank0]:   File "/teamspace/studios/this_studio/NeMo/nemo/collections/nlp/models/language_modeling/megatron/gpt_model.py", line 286, in forward
[rank0]:     lm_output = self.language_model(
[rank0]:   File "/teamspace/studios/this_studio/lightning-thunder/thunder/core/interpreter.py", line 6407, in _impl
[rank0]:     return fn.__func__(fn.__self__, *args, **kwargs)
[rank0]:   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/teamspace/studios/this_studio/lightning-thunder/thunder/core/interpreter.py", line 6407, in _impl
[rank0]:     return fn.__func__(fn.__self__, *args, **kwargs)
[rank0]:   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/teamspace/studios/this_studio/lightning-thunder/thunder/core/interpreter.py", line 6407, in _impl
[rank0]:     return fn.__func__(fn.__self__, *args, **kwargs)
[rank0]:   File "/teamspace/studios/this_studio/NeMo/nemo/collections/nlp/modules/common/megatron/language_model.py", line 824, in forward
[rank0]:     encoder_output = self.encoder(
[rank0]:   File "/teamspace/studios/this_studio/lightning-thunder/thunder/core/interpreter.py", line 6407, in _impl
[rank0]:     return fn.__func__(fn.__self__, *args, **kwargs)
[rank0]:   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/teamspace/studios/this_studio/lightning-thunder/thunder/core/interpreter.py", line 6407, in _impl
[rank0]:     return fn.__func__(fn.__self__, *args, **kwargs)
[rank0]:   File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/teamspace/studios/this_studio/lightning-thunder/thunder/core/interpreter.py", line 6407, in _impl
[rank0]:     return fn.__func__(fn.__self__, *args, **kwargs)
[rank0]:   File "/teamspace/studios/this_studio/NeMo/nemo/collections/nlp/modules/common/megatron/transformer.py", line 1632, in forward
[rank0]:     hidden_states = layer(
[rank0]: NameError: name 'flash_attn_with_kvcache' is not defined

Code sample

Ran the following:

rm -fr foo-neva-train; mkdir -p foo-neva-train
HYDRA_FULL_ERROR=1 \
THUNDER_ANNOTATE_TRACES=1 \
NEMO_THUNDER_NEVA=thunder \
python3 ./examples/multimodal/multimodal_llm/neva/neva_pretrain.py \
    trainer.precision=bf16-mixed \
    model.megatron_amp_O2=True \
    model.mcore_gpt=False \
    trainer.num_nodes=1 \
    trainer.devices=1 \
    trainer.val_check_interval=10 \
    trainer.limit_val_batches=5 \
    trainer.log_every_n_steps=1 \
    ++exp_manager.max_time_per_run=00:00:03:00 \
    trainer.max_steps=20 \
    model.micro_batch_size=2 \
    model.global_batch_size=4 \
    model.tensor_model_parallel_size=1 \
    model.pipeline_model_parallel_size=1 \
    exp_manager.create_checkpoint_callback=False \
    model.data.data_path=./data/multimodal/tiny-neva/dummy.json \
    model.data.image_folder=./data/multimodal/tiny-neva/images \
    model.tokenizer.library=sentencepiece \
    model.tokenizer.model=./data/multimodal/tiny-neva/tokenizer_add_special.model \
    model.num_layers=2 \
    model.hidden_size=5120 \
    model.ffn_hidden_size=13824 \
    model.num_attention_heads=40 \
    model.normalization=rmsnorm \
    model.data.num_workers=0 \
    model.data.conv_template=llama_2 \
    model.mm_cfg.vision_encoder.from_pretrained=openai/clip-vit-large-patch14 \
    model.mm_cfg.llm.from_pretrained=null \
    model.use_flash_attention=false \
    exp_manager.exp_dir=./foo-neva-train

Environment

cc. @kshitij12345 @tfogal

riccardofelluga commented 3 weeks ago

Hey I gave it a go but I couldn't reproduce, how did you install the required packages?

k223kim commented 3 weeks ago

@riccardofelluga Hi! Thanks for trying this out. I followed the instructions provided by Tom Fogal which are the following:

python3 -m pip install --no-deps \
  huggingface-hub==0.23.2

git clone https://github.com/tfogal/NeMo.git
cd NeMo
python3 -m pip install -e .
cd ..
pip install git+https://github.com/NVIDIA/Megatron-LM.git@e33c8f78a35765d5aa37475a144da60e8a2349d1

Can you share what error you are getting?

riccardofelluga commented 3 weeks ago

I get the one we filed an issue earlier #753. Can I ask you where did you find the instructions to get Megatron-LM with that specific commit?

kshitij12345 commented 3 weeks ago

Can I ask you where did you find the instructions to get Megatron-LM with that specific commit?

I had recommended the Megatron-LM commit offline (on slack) as previously I required this particular commit to repro the NeVa related bugs. I got this commit from @tfogal by asking about his environment a month or two ago. Without pinning this commit, I was hitting RuntimeError: Advanced indexing currently only supports zero or one-dimensional integer tensors, but found a tensor with dtype int64 and 2 dimensions (see comment).