Lightning-AI / lightning-thunder

Make PyTorch models up to 40% faster! Thunder is a source to source compiler for PyTorch. It enables using different hardware executors at once; across one or thousands of GPUs.
Apache License 2.0
1.16k stars 77 forks source link

NVFuser error adding thunder.jit to UNet model of NeMo Stable Diffusion #525

Open athitten opened 4 months ago

athitten commented 4 months ago

🐛 Bug

Applying thunder.jit to conv operation in UNet model of NeMo Stable Diffusion gives an error:

Unsupported iterable object type for define_vector! Index:0
Exception raised from define_vector_fn at /opt/pytorch/nvfuser/csrc/python_frontend/python_bindings.cpp:74 (most recent call first):
frame #0: nvfuser::nvfCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xf3 (0x7fe8e1ad75cf in /opt/pytorch/nvfuser/nvfuser/_C.cpython-310-x86_64-linux-gnu.so)

To Reproduce

Steps to reproduce the behavior:

  1. Pull the appropriate NeMo docker image.
  2. Apply the git patch: unet.patch in NeMo repo
  3. Run Stable Diffusion with the command:
python examples/multimodal/text_to_image/stable_diffusion/sd_train.py trainer.precision=16 trainer.num_nodes=1 trainer.devices=1 ++exp_manager.max_time_per_run=00:00:03:00 trainer.max_steps=20 model.micro_batch_size=1 model.global_batch_size=1 model.data.synthetic_data=True exp_manager.exp_dir=/workspace/TestData/multimodal/stable_diffusion_train model.inductor=False model.cond_stage_config._target_=nemo.collections.multimodal.modules.stable_diffusion.encoders.modules.FrozenCLIPEmbedder ++model.cond_stage_config.version=openai/clip-vit-large-patch14 ++model.cond_stage_config.max_length=77 ~model.cond_stage_config.restore_from_path ~model.cond_stage_config.freeze ~model.cond_stage_config.layer model.unet_config.from_pretrained=null model.first_stage_config.from_pretrained=null model.unet_config.use_flash_attention=False model.unet_config.attention_resolutions=\[1\] model.unet_config.channel_mult=\[1\]

Full stack trace:

Exception has occurred: RuntimeError       (note: full exception trace is shown but execution is paused at: _run_module_as_main)
Unsupported iterable object type for define_vector! Index:0
Exception raised from define_vector_fn at /opt/pytorch/nvfuser/csrc/python_frontend/python_bindings.cpp:74 (most recent call first):
frame #0: nvfuser::nvfCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xf3 (0x7f7fb08d75cf in /opt/pytorch/nvfuser/nvfuser/_C.cpython-310-x86_64-linux-gnu.so)
frame #1: <unknown function> + 0x203850 (0x7f7fb09b8850 in /opt/pytorch/nvfuser/nvfuser/_C.cpython-310-x86_64-linux-gnu.so)
frame #2: <unknown function> + 0x20396b (0x7f7fb09b896b in /opt/pytorch/nvfuser/nvfuser/_C.cpython-310-x86_64-linux-gnu.so)
frame #3: <unknown function> + 0x292462 (0x7f7fb0a47462 in /opt/pytorch/nvfuser/nvfuser/_C.cpython-310-x86_64-linux-gnu.so)
frame #4: <unknown function> + 0x28a840 (0x7f7fb0a3f840 in /opt/pytorch/nvfuser/nvfuser/_C.cpython-310-x86_64-linux-gnu.so)
frame #5: <unknown function> + 0x15a10e (0x55cc9c0e110e in /usr/bin/python)
frame #6: _PyObject_MakeTpCall + 0x25b (0x55cc9c0d7a7b in /usr/bin/python)
frame #7: <unknown function> + 0x168acb (0x55cc9c0efacb in /usr/bin/python)
frame #8: _PyEval_EvalFrameDefault + 0x614a (0x55cc9c0cfcfa in /usr/bin/python)
frame #9: _PyFunction_Vectorcall + 0x7c (0x55cc9c0e19fc in /usr/bin/python)
frame #10: PyObject_Call + 0x122 (0x55cc9c0f0492 in /usr/bin/python)
frame #11: _PyEval_EvalFrameDefault + 0x2a27 (0x55cc9c0cc5d7 in /usr/bin/python)
frame #12: _PyFunction_Vectorcall + 0x7c (0x55cc9c0e19fc in /usr/bin/python)
frame #13: _PyEval_EvalFrameDefault + 0x6bd (0x55cc9c0ca26d in /usr/bin/python)
frame #14: _PyFunction_Vectorcall + 0x7c (0x55cc9c0e19fc in /usr/bin/python)
frame #15: _PyEval_EvalFrameDefault + 0x6bd (0x55cc9c0ca26d in /usr/bin/python)
frame #16: _PyFunction_Vectorcall + 0x7c (0x55cc9c0e19fc in /usr/bin/python)
frame #17: <unknown function> + 0x291734 (0x55cc9c218734 in /usr/bin/python)
frame #18: _PyObject_MakeTpCall + 0x25b (0x55cc9c0d7a7b in /usr/bin/python)
frame #19: _PyEval_EvalFrameDefault + 0x6a79 (0x55cc9c0d0629 in /usr/bin/python)
frame #20: _PyObject_FastCallDictTstate + 0xc4 (0x55cc9c0d6c14 in /usr/bin/python)
frame #21: _PyObject_Call_Prepend + 0xc1 (0x55cc9c0ec8d1 in /usr/bin/python)
frame #22: <unknown function> + 0x280700 (0x55cc9c207700 in /usr/bin/python)
frame #23: _PyObject_MakeTpCall + 0x25b (0x55cc9c0d7a7b in /usr/bin/python)
frame #24: _PyEval_EvalFrameDefault + 0x64e6 (0x55cc9c0d0096 in /usr/bin/python)
frame #25: _PyFunction_Vectorcall + 0x7c (0x55cc9c0e19fc in /usr/bin/python)
frame #26: _PyEval_EvalFrameDefault + 0x2a27 (0x55cc9c0cc5d7 in /usr/bin/python)
frame #27: _PyFunction_Vectorcall + 0x7c (0x55cc9c0e19fc in /usr/bin/python)
frame #28: _PyEval_EvalFrameDefault + 0x2a27 (0x55cc9c0cc5d7 in /usr/bin/python)
frame #29: _PyFunction_Vectorcall + 0x7c (0x55cc9c0e19fc in /usr/bin/python)
frame #30: _PyEval_EvalFrameDefault + 0x2a27 (0x55cc9c0cc5d7 in /usr/bin/python)
frame #31: _PyFunction_Vectorcall + 0x7c (0x55cc9c0e19fc in /usr/bin/python)
frame #32: _PyEval_EvalFrameDefault + 0x614a (0x55cc9c0cfcfa in /usr/bin/python)
frame #33: _PyFunction_Vectorcall + 0x7c (0x55cc9c0e19fc in /usr/bin/python)
frame #34: _PyEval_EvalFrameDefault + 0x2a27 (0x55cc9c0cc5d7 in /usr/bin/python)
frame #35: _PyFunction_Vectorcall + 0x7c (0x55cc9c0e19fc in /usr/bin/python)
frame #36: _PyEval_EvalFrameDefault + 0x2a27 (0x55cc9c0cc5d7 in /usr/bin/python)
frame #37: <unknown function> + 0x16893e (0x55cc9c0ef93e in /usr/bin/python)
frame #38: torch::autograd::PyNode::apply(std::vector<at::Tensor, std::allocator<at::Tensor> >&&) + 0x95 (0x7f8185fbc245 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_python.so)
frame #39: <unknown function> + 0x4b1825b (0x7f817e4a525b in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cpu.so)
frame #40: torch::autograd::Engine::evaluate_function(std::shared_ptr<torch::autograd::GraphTask>&, torch::autograd::Node*, torch::autograd::InputBuffer&, std::shared_ptr<torch::autograd::ReadyQueue> const&) + 0xd36 (0x7f817e49f4f6 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cpu.so)
frame #41: torch::autograd::Engine::thread_main(std::shared_ptr<torch::autograd::GraphTask> const&) + 0x58e (0x7f817e4a072e in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cpu.so)
frame #42: torch::autograd::Engine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&, bool) + 0x2a9 (0x7f817e498ed9 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cpu.so)
frame #43: torch::autograd::python::PythonEngine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&, bool) + 0x75 (0x7f8185fb6dd5 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_python.so)
frame #44: <unknown function> + 0xdc253 (0x7f81a9a2b253 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
frame #45: <unknown function> + 0x94ac3 (0x7f81a9c11ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6)
frame #46: clone + 0x44 (0x7f81a9ca2a04 in /usr/lib/x86_64-linux-gnu/libc.so.6)
  File "/workspace/software/lightning-thunder/thunder/executors/nvfuserex_impl.py", line 1194, in reshape
    return fd.ops.reshape(nv_a, shape)
  File "/workspace/software/lightning-thunder/thunder/executors/nvfuserex_impl.py", line 263, in translate_bound_symbol
    nvresults = translator(*bsym.args, **bsym.kwargs, fd=fd, lc_to_nv_map=lc_to_nv_map)
  File "/workspace/software/lightning-thunder/thunder/executors/nvfuserex_impl.py", line 273, in create_fd
    translate_bound_symbol(bsym)
  File "/workspace/software/lightning-thunder/thunder/executors/nvfuserex_impl.py", line 511, in get_fd
    return create_fd(bsyms, input_descriptors, sorted_unique_inputs, sorted_unique_outputs)
  File "/workspace/software/lightning-thunder/thunder/executors/nvfuserex_impl.py", line 401, in __call__
    fd = self.get_fd(to_descriptors(args))
  File "/usr/local/lib/python3.10/dist-packages/thunder.backward_fn_3", line 45, in backward_fn
  File "/usr/local/lib/python3.10/dist-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/workspace/software/lightning-thunder/thunder/executors/torch_autograd.py", line 95, in backward
    grads = ctx.compiled_backward([saved_tensors_list, ctx.saved_other], args)
  File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 592, in wrapper
    outputs = fn(ctx, *args)
  File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 302, in apply
    return user_fn(self, *args)
  File "/usr/local/lib/python3.10/dist-packages/torch/autograd/graph.py", line 767, in _engine_run_backward
    return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py", line 267, in backward
    _engine_run_backward(
  File "/usr/local/lib/python3.10/dist-packages/megatron/core/pipeline_parallel/schedules.py", line 274, in backward_step
    torch.autograd.backward(output_tensor[0], grad_tensors=output_tensor_grad[0])
  File "/usr/local/lib/python3.10/dist-packages/megatron/core/pipeline_parallel/schedules.py", line 387, in forward_backward_no_pipelining
    backward_step(input_tensor, output_tensor, output_tensor_grad, model_type, config)
  File "/usr/local/lib/python3.10/dist-packages/nemo/collections/multimodal/models/text_to_image/stable_diffusion/ldm/ddpm.py", line 1736, in fwd_bwd_step
    losses_reduced_per_micro_batch = fwd_bwd_function(
  File "/usr/local/lib/python3.10/dist-packages/nemo/collections/multimodal/models/text_to_image/stable_diffusion/ldm/ddpm.py", line 1797, in training_step
    loss_mean, loss_dict = self.fwd_bwd_step(dataloader_iter, batch_idx, False)
  File "/usr/local/lib/python3.10/dist-packages/nemo/utils/model_utils.py", line 381, in wrap_training_step
    output_dict = wrapped(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/overrides/base.py", line 90, in forward
    output = self._forward_module.training_step(*inputs, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/parallel/distributed.py", line 1436, in _run_ddp_forward
    return self.module(*inputs, **kwargs)  # type: ignore[index]
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/parallel/distributed.py", line 1618, in forward
    else self._run_ddp_forward(*inputs, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/strategies/ddp.py", line 330, in training_step
    return self.model(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/call.py", line 293, in _call_strategy_hook
    output = fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/loops/optimization/automatic.py", line 315, in _training_step
    training_step_output = call._call_strategy_hook(trainer, "training_step", *kwargs.values())
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/loops/optimization/automatic.py", line 128, in closure
    step_output = self._step_fn()
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/loops/optimization/automatic.py", line 142, in __call__
    self._result = self.closure(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 103, in _wrap_closure
    closure_result = closure()
  File "/usr/local/lib/python3.10/dist-packages/apex/optimizers/fused_adam.py", line 140, in step
    loss = closure()
  File "/usr/local/lib/python3.10/dist-packages/torch/optim/optimizer.py", line 438, in wrapper
    out = func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/optim/lr_scheduler.py", line 96, in wrapper
    return wrapped(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/nemo/collections/common/callbacks/ema.py", line 250, in step
    loss = self.optimizer.step(closure)
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 116, in optimizer_step
    return optimizer.step(closure=closure, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/strategies/strategy.py", line 231, in optimizer_step
    return self.precision_plugin.optimizer_step(optimizer, model=model, closure=closure, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/strategies/ddp.py", line 257, in optimizer_step
    optimizer_output = super().optimizer_step(optimizer, closure, model, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/core/optimizer.py", line 161, in step
    step_output = self._strategy.optimizer_step(self._optimizer, closure, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/core/module.py", line 1270, in optimizer_step
    optimizer.step(closure=optimizer_closure)
  File "/usr/local/lib/python3.10/dist-packages/nemo/collections/nlp/models/language_modeling/megatron_base_model.py", line 1249, in optimizer_step
    super().optimizer_step(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/call.py", line 145, in _call_lightning_module_hook
    output = fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/loops/optimization/automatic.py", line 266, in _optimizer_step
    call._call_lightning_module_hook(
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/loops/optimization/automatic.py", line 188, in run
    self._optimizer_step(kwargs.get("batch_idx", 0), closure)
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/loops/training_epoch_loop.py", line 219, in advance
    batch_output = self.automatic_optimization.run(trainer.optimizers[0], kwargs)
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/loops/training_epoch_loop.py", line 133, in run
    self.advance(data_fetcher)
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/loops/fit_loop.py", line 355, in advance
    self.epoch_loop.run(self._data_fetcher)
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/loops/fit_loop.py", line 202, in run
    self.advance()
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 1023, in _run_stage
    self.fit_loop.run()
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 980, in _run
    results = self._run_stage()
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 571, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 93, in launch
    return function(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/call.py", line 42, in _call_and_handle_interrupt
    return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 532, in fit
    call._call_and_handle_interrupt(
  File "/workspace/software/NeMo/examples/multimodal/text_to_image/stable_diffusion/sd_train.py", line 80, in main
    trainer.fit(model)
  File "/usr/local/lib/python3.10/dist-packages/hydra/core/utils.py", line 186, in run_job
    ret.return_value = task_function(task_cfg)
  File "/usr/local/lib/python3.10/dist-packages/hydra/core/utils.py", line 260, in return_value
    raise self._return_value
  File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/hydra.py", line 132, in run
    _ = ret.return_value
  File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py", line 458, in <lambda>
    lambda: hydra.run(
  File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py", line 223, in run_and_report
    raise ex
  File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py", line 223, in run_and_report
    raise ex
  File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py", line 457, in _run_app
    run_and_report(
  File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py", line 394, in _run_hydra
    _run_app(
  File "/usr/local/lib/python3.10/dist-packages/nemo/core/config/hydra_runner.py", line 129, in wrapper
    _run_hydra(
  File "/workspace/software/NeMo/examples/multimodal/text_to_image/stable_diffusion/sd_train.py", line 84, in <module>
    main()
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main (Current frame)
    return _run_code(code, main_globals, None,
RuntimeError: Unsupported iterable object type for define_vector! Index:0
Exception raised from define_vector_fn at /opt/pytorch/nvfuser/csrc/python_frontend/python_bindings.cpp:74 (most recent call first):
frame #0: nvfuser::nvfCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xf3 (0x7f7fb08d75cf in /opt/pytorch/nvfuser/nvfuser/_C.cpython-310-x86_64-linux-gnu.so)
frame #1: <unknown function> + 0x203850 (0x7f7fb09b8850 in /opt/pytorch/nvfuser/nvfuser/_C.cpython-310-x86_64-linux-gnu.so)
frame #2: <unknown function> + 0x20396b (0x7f7fb09b896b in /opt/pytorch/nvfuser/nvfuser/_C.cpython-310-x86_64-linux-gnu.so)
frame #3: <unknown function> + 0x292462 (0x7f7fb0a47462 in /opt/pytorch/nvfuser/nvfuser/_C.cpython-310-x86_64-linux-gnu.so)
frame #4: <unknown function> + 0x28a840 (0x7f7fb0a3f840 in /opt/pytorch/nvfuser/nvfuser/_C.cpython-310-x86_64-linux-gnu.so)
frame #5: <unknown function> + 0x15a10e (0x55cc9c0e110e in /usr/bin/python)
frame #6: _PyObject_MakeTpCall + 0x25b (0x55cc9c0d7a7b in /usr/bin/python)
frame #7: <unknown function> + 0x168acb (0x55cc9c0efacb in /usr/bin/python)
frame #8: _PyEval_EvalFrameDefault + 0x614a (0x55cc9c0cfcfa in /usr/bin/python)
frame #9: _PyFunction_Vectorcall + 0x7c (0x55cc9c0e19fc in /usr/bin/python)
frame #10: PyObject_Call + 0x122 (0x55cc9c0f0492 in /usr/bin/python)
frame #11: _PyEval_EvalFrameDefault + 0x2a27 (0x55cc9c0cc5d7 in /usr/bin/python)
frame #12: _PyFunction_Vectorcall + 0x7c (0x55cc9c0e19fc in /usr/bin/python)
frame #13: _PyEval_EvalFrameDefault + 0x6bd (0x55cc9c0ca26d in /usr/bin/python)
frame #14: _PyFunction_Vectorcall + 0x7c (0x55cc9c0e19fc in /usr/bin/python)
frame #15: _PyEval_EvalFrameDefault + 0x6bd (0x55cc9c0ca26d in /usr/bin/python)
frame #16: _PyFunction_Vectorcall + 0x7c (0x55cc9c0e19fc in /usr/bin/python)
frame #17: <unknown function> + 0x291734 (0x55cc9c218734 in /usr/bin/python)
frame #18: _PyObject_MakeTpCall + 0x25b (0x55cc9c0d7a7b in /usr/bin/python)
frame #19: _PyEval_EvalFrameDefault + 0x6a79 (0x55cc9c0d0629 in /usr/bin/python)
frame #20: _PyObject_FastCallDictTstate + 0xc4 (0x55cc9c0d6c14 in /usr/bin/python)
frame #21: _PyObject_Call_Prepend + 0xc1 (0x55cc9c0ec8d1 in /usr/bin/python)
frame #22: <unknown function> + 0x280700 (0x55cc9c207700 in /usr/bin/python)
frame #23: _PyObject_MakeTpCall + 0x25b (0x55cc9c0d7a7b in /usr/bin/python)
frame #24: _PyEval_EvalFrameDefault + 0x64e6 (0x55cc9c0d0096 in /usr/bin/python)
frame #25: _PyFunction_Vectorcall + 0x7c (0x55cc9c0e19fc in /usr/bin/python)
frame #26: _PyEval_EvalFrameDefault + 0x2a27 (0x55cc9c0cc5d7 in /usr/bin/python)
frame #27: _PyFunction_Vectorcall + 0x7c (0x55cc9c0e19fc in /usr/bin/python)
frame #28: _PyEval_EvalFrameDefault + 0x2a27 (0x55cc9c0cc5d7 in /usr/bin/python)
frame #29: _PyFunction_Vectorcall + 0x7c (0x55cc9c0e19fc in /usr/bin/python)
frame #30: _PyEval_EvalFrameDefault + 0x2a27 (0x55cc9c0cc5d7 in /usr/bin/python)
frame #31: _PyFunction_Vectorcall + 0x7c (0x55cc9c0e19fc in /usr/bin/python)
frame #32: _PyEval_EvalFrameDefault + 0x614a (0x55cc9c0cfcfa in /usr/bin/python)
frame #33: _PyFunction_Vectorcall + 0x7c (0x55cc9c0e19fc in /usr/bin/python)
frame #34: _PyEval_EvalFrameDefault + 0x2a27 (0x55cc9c0cc5d7 in /usr/bin/python)
frame #35: _PyFunction_Vectorcall + 0x7c (0x55cc9c0e19fc in /usr/bin/python)
frame #36: _PyEval_EvalFrameDefault + 0x2a27 (0x55cc9c0cc5d7 in /usr/bin/python)
frame #37: <unknown function> + 0x16893e (0x55cc9c0ef93e in /usr/bin/python)
frame #38: torch::autograd::PyNode::apply(std::vector<at::Tensor, std::allocator<at::Tensor> >&&) + 0x95 (0x7f8185fbc245 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_python.so)
frame #39: <unknown function> + 0x4b1825b (0x7f817e4a525b in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cpu.so)
frame #40: torch::autograd::Engine::evaluate_function(std::shared_ptr<torch::autograd::GraphTask>&, torch::autograd::Node*, torch::autograd::InputBuffer&, std::shared_ptr<torch::autograd::ReadyQueue> const&) + 0xd36 (0x7f817e49f4f6 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cpu.so)
frame #41: torch::autograd::Engine::thread_main(std::shared_ptr<torch::autograd::GraphTask> const&) + 0x58e (0x7f817e4a072e in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cpu.so)
frame #42: torch::autograd::Engine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&, bool) + 0x2a9 (0x7f817e498ed9 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cpu.so)
frame #43: torch::autograd::python::PythonEngine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&, bool) + 0x75 (0x7f8185fb6dd5 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_python.so)
frame #44: <unknown function> + 0xdc253 (0x7f81a9a2b253 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
frame #45: <unknown function> + 0x94ac3 (0x7f81a9c11ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6)
frame #46: clone + 0x44 (0x7f81a9ca2a04 in /usr/lib/x86_64-linux-gnu/libc.so.6)

cc: @tfogal

cc @tfogal

xwang233 commented 4 months ago

The image was from a bit old environment of pjnl-20240417, which has

athitten commented 4 months ago

@xwang233 to build nemo on top of pjnl used the pjnl container (gitlab-master.nvidia.com:5005/dl/pytorch/update-scripts:pjnl-latest) from last Friday. Is the error fixed in the newest version of pjnl container ? I can try with that if thats the case. Also just to make sure, the pjnl container name is the same and it gets updated with the latest version of the softwares/packages each time right ?

xwang233 commented 4 months ago

I don't know how to build or import NeMo so I can't verify that.

The image pjnl-latest is always the latest build. You can use pjnl-YYYYMMDD to pin a dated image in your build.

mruberry commented 4 months ago

triage review — @athitten can you help us understand if this happens on more recent versions of nvfuser?

@kevinstephano maybe we should update the nvfuser error message to print the Python type so we could maybe reproduce this?

kevinstephano commented 4 months ago

This is potentially a little more weird. The error is actually suggesting that the reshape shape is composed of something other than python integers or nvFuser Scalars suggesting that the FusionDefinition was malformed for reshape.

kevinstephano commented 4 months ago

I need to add something like the following to report the type in pybind11:

void check_type(py::handle obj) {
    py::handle type = py::type::handle_of(obj);
    if (!type.is_none()) {
        std::string type_name = static_cast<std::string>(py::str(type));
        std::cout << "Object type: " << type_name << std::endl;
    } else {
        std::cout << "Error: Failed to get object type" << std::endl;
    }
}
athitten commented 4 months ago

Tried with the latest NVFuser, was not able to reproduce this error. We can perhaps close this issue. I will open a new one if something similar comes up in the future.