Lightning-AI / lightning-thunder

Make PyTorch models up to 40% faster! Thunder is a source to source compiler for PyTorch. It enables using different hardware executors at once; across one or thousands of GPUs.
Apache License 2.0
1.15k stars 77 forks source link

maximum recursion depth in NeVA #674

Closed tfogal closed 3 months ago

tfogal commented 3 months ago

🐛 Bug

[rank0]:   File "/home/tfogal/dev/thunder/thunder/core/interpreter.py", line 6061, in _impl
[rank0]:     return fn.__func__(fn.__self__, *args, **kwargs)
[rank0]:   File "/home/tfogal/dev/pytorch/torch/nn/modules/module.py", line 1575, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/home/tfogal/dev/thunder/thunder/core/interpreter.py", line 6061, in _impl
[rank0]:     return fn.__func__(fn.__self__, *args, **kwargs)
[rank0]:   File "/home/tfogal/dev/nemo/nemo/collections/multimodal/models/multimodal_llm/neva/neva_model.py", line 155, in forward
[rank0]:     return self.replace_media_embeddings(input_ids, words_embeddings, media)
[rank0]:   File "/home/tfogal/dev/nemo/nemo/collections/multimodal/models/multimodal_llm/neva/neva_model.py", line 195, in replace_media_embeddings
[rank0]:     media_features = self.encode_vision_x(media)  # b T F S(eq) H(idden)
[rank0]:   File "/home/tfogal/dev/nemo/nemo/collections/multimodal/models/multimodal_llm/neva/neva_model.py", line 176, in encode_vision_x
[rank0]:     vision_x = self.vision_encoder(vision_x, output_hidden_states=True)
[rank0]: thunder.core.interpreter.InterpreterError: Encountered exception RecursionError: maximum recursion depth exceeded in comparison

Full log of the run which includes the full traceback.

To Reproduce

First apply Kshiteej's patch from #601.

Then install the tfogal/thunder-nemo branch of https://github.com/tfogal/NeMo. Then run:

HYDRA_FULL_ERROR=1 \
THUNDER_ANNOTATE_TRACES=1 \
NEMO_THUNDER_NEVA=1 \
python3 ./examples/multimodal/multimodal_llm/neva/neva_pretrain.py trainer.precision=16 model.megatron_amp_O2=False trainer.num_nodes=1 trainer.devices=1 trainer.val_check_interval=10 trainer.limit_val_batches=5 trainer.log_every_n_steps=1 ++exp_manager.max_time_per_run=00:00:03:00 trainer.max_steps=20 model.micro_batch_size=2 model.global_batch_size=4 model.tensor_model_parallel_size=1 model.pipeline_model_parallel_size=1 exp_manager.create_checkpoint_callback=False model.data.data_path=./data/multimodal/tiny-neva/dummy.json model.data.image_folder=./data/multimodal/tiny-neva/images model.tokenizer.library=sentencepiece model.tokenizer.model=./data/multimodal/tiny-neva/tokenizer_add_special.model model.num_layers=2 model.hidden_size=5120 model.ffn_hidden_size=13824 model.num_attention_heads=40 model.normalization=rmsnorm model.data.num_workers=0 model.data.conv_template=llama_2 model.mm_cfg.vision_encoder.from_pretrained=openai/clip-vit-large-patch14 model.mm_cfg.llm.from_pretrained=null model.use_flash_attention=false exp_manager.exp_dir=./foo-neva-train

Note you'll need a few files for the referenced ./data directory; you can ping me privately for now, and I'll work on pulling them out from behind the curtain.

Environment

$ nvidia-smi | grep -i cuda
| NVIDIA-SMI 555.42.02              Driver Version: 555.42.02      CUDA Version: 12.5     |
$ python3 -m pip freeze | egrep -i "(nvfuser)|(lightning)|(thunder)|(nemo)"
-e git+ssh://git@github.com/tfogal/lightning.git@8df5db52ead1804f9021bb07caa2d4a7a6ab03a1#egg=lightning
lightning-cloud==0.5.69
-e git+ssh://git@github.com/Lightning-AI/lightning-thunder.git@8c5905fd1a93145e690791a7c7a3c3e10b16b32b#egg=lightning_thunder
lightning-utilities==0.11.2
-e git+ssh://git@github.com/NVIDIA/NeMo.git@c86449e1a93049d2283ebcea8ee4546f2ea241de#egg=nemo_toolkit
# Editable Git install with no remote (nvfuser==0.2.6+git9c5f006)
-e /opt/pytorch/nvfuser
pytorch-lightning==2.3.0

cc @tfogal

kshitij12345 commented 3 months ago

Here is a smaller repro, (also this doesn't require the patch from #601):

import torch
import thunder
from transformers.utils.generic import ModelOutput

def fn(x):
    mo = ModelOutput(foo=x)
    return mo["foo"]

# Sanity - eager works
print(fn(torch.randn(3,)))

# Recursion error
thunder.jit(fn)(torch.randn(3,))
crcrpar commented 3 months ago

Since ModelOutput inherits OrderedDict https://github.com/huggingface/transformers/blob/1c68f2cafb4ca54562f74b66d1085b68dd6682f5/src/transformers/utils/generic.py#L310, I tried OrderedDict instead on a whim, and the thunder.jit worked. just fyi

t-vi commented 3 months ago

Thank you @kshitij12345 for the minimal repro and @crcrpar for the additional info.

The thing we need is to support the wrapper tracking here so we know the mo["foo"] is the same object as the x we put in there. This is a bit tricky due to the multi-inheritance, but so we should check

If anyone wants to take a stab, don't hesitate to chatting me up. If not I will try to look and write up what I'm doing. (Additional the infinite recursion is not great, likely somethign funny.)

t-vi commented 3 months ago

Thinking about this more: Maybe, just maybe, all this works right and the inf recursion is a problem with the MRO in the lookaside in the __getitem__. We do implement things like iterating over items through __getitem__ and should be careful to call OrderedDict.__getitem__(self,...) from the ordered iteration rather than self.__getitem__.

t-vi commented 3 months ago

I have a fix, based on Kshiteej's repro and Masaki's additional analysis. Awesome work, @kshitij12345 and @crcrpar