NVIDIA / modulus

Open-source deep-learning framework for building, training, and fine-tuning deep learning models using state-of-the-art Physics-ML methods
https://developer.nvidia.com/modulus
Apache License 2.0
791 stars 165 forks source link

🐛[BUG]: Running `corrdiff/generate.py` raises a shape exception #542

Open gideonite opened 1 month ago

gideonite commented 1 month ago

Version

latest

On which installation method(s) does this occur?

No response

Describe the issue

See log output below

Minimum reproducible example

No response

Relevant log output

Error executing job with overrides: ['dataset.data_path=/data/gideond/corrdiff_inference_package/dataset/2023-01-24-cwb-4years_5times.zarr', 'res_ckpt_filename=/data/gideond/corrdiff_inference_package/checkpoints/diffusion.mdlus', 'reg_ckpt_filename=/data/gideond/corrdiff_inference_package/checkpoints/regression.mdlus', 'seed_batch_size=5', 'use_torch_compile=false']
Traceback (most recent call last):
  File "/net/nfs.cirrascale/climate/gideond/home/projects/modulus/examples/generative/corrdiff/generate.py", line 310, in main
    generate_and_save(
  File "/net/nfs.cirrascale/climate/gideond/home/projects/modulus/examples/generative/corrdiff/generate.py", line 396, in generate_and_save
    image_out = generate_fn(image_lr)
                ^^^^^^^^^^^^^^^^^^^^^
  File "/net/nfs.cirrascale/climate/gideond/home/projects/modulus/examples/generative/corrdiff/generate.py", line 232, in generate_fn
    image_reg = generate(
                ^^^^^^^^^
  File "/net/nfs.cirrascale/climate/gideond/home/projects/modulus/examples/generative/corrdiff/generate.py", line 541, in generate
    images = sampler_fn(
             ^^^^^^^^^^^
  File "/net/nfs.cirrascale/climate/gideond/home/projects/modulus/examples/generative/corrdiff/generate.py", line 609, in unet_regression
    x_next = net(x_hat[0:1], x_lr, t_hat, class_labels).to(torch.float64)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/gideond/.conda/envs/corrdiff/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/gideond/.conda/envs/corrdiff/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/gideond/.conda/envs/corrdiff/lib/python3.12/site-packages/modulus/models/diffusion/unet.py", line 152, in forward
    F_x = self.model(
          ^^^^^^^^^^^
  File "/home/gideond/.conda/envs/corrdiff/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/gideond/.conda/envs/corrdiff/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/gideond/.conda/envs/corrdiff/lib/python3.12/site-packages/nvtx/nvtx.py", line 116, in inner
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/home/gideond/.conda/envs/corrdiff/lib/python3.12/site-packages/modulus/models/diffusion/song_unet.py", line 347, in forward
    x = block(x, emb) if isinstance(block, UNetBlock) else block(x)
                                                           ^^^^^^^^
  File "/home/gideond/.conda/envs/corrdiff/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/gideond/.conda/envs/corrdiff/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/gideond/.conda/envs/corrdiff/lib/python3.12/site-packages/modulus/models/diffusion/layers.py", line 224, in forward
    x = torch.nn.functional.conv2d(x, w, padding=w_pad)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Given groups=1, weight of size [128, 20, 3, 3], expected input[1, 16, 448, 448] to have 20 channels, but got 16 channels instead

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

Environment details

No response

windsoryin commented 3 weeks ago

i also have same error raised using CorrDiff Inference Package

windsoryin commented 2 weeks ago

This is due to the wrong arguments in config_generate.yaml, the input channels: [0, 1, 2, 3, 4, 9, 10, 11, 12, 17, 18, 19] didn't match the 20 channels used in pre-trained models. what's more, they are also overlapped with output channels, [0, 17, 18, 19].