NVIDIA / modulus

Open-source deep-learning framework for building, training, and fine-tuning deep learning models using state-of-the-art Physics-ML methods
https://developer.nvidia.com/modulus
Apache License 2.0
791 stars 165 forks source link

🐛[BUG]: Corrdiff incompatible with`SongUNetPosEmbd` - constructor missing `checkpoint_level` argument #534

Closed stathius closed 5 days ago

stathius commented 1 month ago

Version

0.7.0a

On which installation method(s) does this occur?

Docker, Source

Describe the issue

Analysis: The error comes from modulus.models.diffusion.song_unet.SongUNetPosEmbd not accepting the keyword_argument checkpoint_level.

Proposed fix: Add checkpoint_level to the constructor of SongUNetPosEmbd and pass it to its parent SongUNet.

Minimum reproducible example

python3 train.py --config-name=config_train_diffusion.yaml
config: 
     arch: ddpmpp-cwb
     precond: edmv1
     task: diffusion

Relevant log output

Traceback (most recent call last):
  File "/code/modulus/examples/generative/corrdiff/train.py", line 344, in main
    training_loop.training_loop(
  File "/code/modulus/examples/generative/corrdiff/training/training_loop.py", line 166, in training_loop   <--------------
    net = construct_class_by_name(**merged_args)  # subclass of torch.nn.Module
  File "/code/modulus/modulus/utils/generative/utils.py", line 306, in construct_class_by_name
    return call_func_by_name(*args, func_name=class_name, **kwargs)
  File "/code/modulus/modulus/utils/generative/utils.py", line 296, in call_func_by_name
    return func_obj(*args, **kwargs)
  File "/code/modulus/modulus/models/diffusion/preconditioning.py", line 969, in __init__ <--------------
    model = model_class(
  File "/code/modulus/modulus/models/module.py", line 65, in __new__
    bound_args = sig.bind_partial(
  File "/usr/lib/python3.10/inspect.py", line 3193, in bind_partial
    return self._bind(args, kwargs, partial=True)
  File "/usr/lib/python3.10/inspect.py", line 3175, in _bind
    raise TypeError(
TypeError: got an unexpected keyword argument 'checkpoint_level'

Environment details

python version: 3.10
modulus commit: `c07fa25321c48a1d71efca12b67d056adbca8bd4`
yairchn commented 1 month ago

hi @stathius could you comment what checkpoint you used here?

stathius commented 1 month ago

hi @stathius could you comment what checkpoint you used here?

Hi @yairchn you mean for the regression U-Net? It was 053327 but I am not sure it matters in this case.

daviddpruitt commented 5 days ago

This is fixed with https://github.com/NVIDIA/modulus/pull/550, closing the issue.

stathius commented 5 days ago

550 was not merged.

Best, Stathi

On Thu, 27 Jun 2024 at 15:08, David Pruitt @.***> wrote:

This is fixed with #550 https://github.com/NVIDIA/modulus/pull/550, closing the issue.

— Reply to this email directly, view it on GitHub https://github.com/NVIDIA/modulus/issues/534#issuecomment-2195747918, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAPQ4BMCFLCJQCPW46TVGFDZJSEOBAVCNFSM6AAAAABIR2A43CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJVG42DOOJRHA . You are receiving this because you were mentioned.Message ID: @.***>