Lightning-AI / pytorch-lightning

Pretrain, finetune ANY AI model of ANY size on multiple GPUs, TPUs with zero code changes.
https://lightning.ai
Apache License 2.0
28.42k stars 3.39k forks source link

Constructor arguments in init_args get instantiated while parsing arguments of LightningModule #19574

Closed tommycwh closed 8 months ago

tommycwh commented 8 months ago

Bug description

I have a model and one can specify the layers as arguments of the model's constructor, for example, one can set a different normalization layer to the boring model below with a command like model = BoringNN(norm_layer=nn.InstanceNorm1d):

BoringNN:

class BoringNN(nn.Module) :
    def __init__(
        self, 
        norm_layer: nn.Module = nn.BatchNorm1d,
        activation_layer: nn.Module = nn.ReLU,
    ):
        super().__init__()
        self.layer = nn.Linear(3,32)
        self.norm = norm_layer(32)
        self.activation = activation_layer()

    def forward(self, x):
        x = self.layer(x)
        x = self.norm(x)
        x = self.activation(x)
        return x

What version are you seeing the problem on?

v2.2

How to reproduce the bug

Here, I am trying to configure this model with Lightning CLI as below:

BoringModel:

class BoringModel(L.LightningModule) :
    def __init__(
        self, 
        model: nn.Module,
    ):
        super().__init__()
        self.save_hyperparameters()

        self.model = model
        ... 

config.yaml

model:
  class_path: boring_model.BoringModel
  init_args:
    model:
      class_path: boring_model.BoringNN
      init_args:
        norm_layer:
          class_path: torch.nn.BatchNorm2d
        activation_layer:
          class_path: torch.nn.ReLU
...

Error messages and logs

But then it gives this error:

usage: main.py [-h] [-c CONFIG] [--print_config[=flags]] {fit,validate,test,predict} ...
error: Parser key "model":
  Problem with given class_path 'boring_model.BoringModel':
    Parser key "model":
      Problem with given class_path 'boring_model.BoringNN':
        Parser key "norm_layer":
          Problem with given class_path 'torch.nn.BatchNorm2d':
            Validation failed: Key "num_features" is required but not included in config object or its value is None.

Environment

Current environment ``` - Lightning Component: Trainer, LightningModule, LightningCLI - PyTorch Lightning Version: 2.2.0.post0 - PyTorch Version: 2.0.0+cu118 - Python version: 3.10.13 - OS (e.g., Linux): Linux - jsonargparse: 4.27.5 - omegaconf: 2.3.0 ```

More info

As far as I understand, it is trying to instantiate a norm_layer object while parsing the arguments, but the constructor torch.nn.BatchNorm2d requires an argument, e.g., as in norm = torch.nn.BatchNorm2d(32). This argument is not provided in the config file, so it fails to instantiate it.

However, I do not want to pass an object of the specified type to BoringNN, I just want to pass the constructor so that the object will be created inside BoringNN.

I think that the current behavior is also reasonable, as some may want to pass in some objects as arguments. Therefore, I want to ask if there is a way to distinguish these two types of init_args in the config file, (1) objects to be instantiated and (2) object constructors (as discussed above).

Side note: To avoid issues with logging arguments like nn.BatchNorm2d, I run with save_config_callback=None set in the main.py.

# main.py
from lightning.pytorch.cli import LightningCLI
import lightning as L

def cli_main():
    # note: don't call fit!!
    cli = LightningCLI(L.LightningModule, L.LightningDataModule, 
                       subclass_mode_model=True, subclass_mode_data=True,
                       auto_configure_optimizers=False,
                       save_config_callback=None
                       )

if __name__ == "__main__":
    cli_main()
    # note: it is good practice to implement the CLI in a function and call it in the main if block

cc @carmocca @mauvilsa

awaelchli commented 8 months ago

Hi @tommycwh This annotation looks wrong:

norm_layer: nn.Module = nn.BatchNorm1d,

The annotation and the default contradict each other. Do you want norm_layer to be the type of a module, or the instance of a module?

Something like this would make sense:

norm_layer: nn.Module = nn.BatchNorm1d(),

Or something like this:

norm_layer: Type[nn.Module] = nn.BatchNorm1d,

But the two mixed together not. This is very likely the issue why the CLI is not able to validate / instantiate your argument.

mauvilsa commented 8 months ago

Yes, the annotation is wrong. A type nn.Module means that an instance of that type should be provided. If you want to instantiate inside, then the type should be Type[nn.Module], meaning a subtype of Module, or a bit more general Callable[..., nn.Module] meaning a callable that retuns an instance of a subtype of Module.

tommycwh commented 8 months ago

Thank you very much for your explanation. I want norm_layer to be a type, so I am going with Type[nn.Module].

Would you mind me asking an additional question? Since there are some "layers" that can be configured with some arguments, like act_layer = nn.LeakyReLU(negative_slope=0.01), I am wondering if the arguments can be set in a config file too. I tried to do it like below but it did not work, and I guess it is because act_layer is not an instance.

act_layer: torch.nn.LeakyReLU
  init_args:
    negative_slope: 0.02 

# OR

act_layer: torch.nn.LeakyReLU
    negative_slope: 0.02 

Is there a way to set these arguments in a config file?

mauvilsa commented 8 months ago

To make the init_args configurable you must use as type Callable[..., nn.Module]. Neither of the config files you wrote are correct. In fact, they are not even valid yaml. There should be both class_path and init_args.

What you want is analogous to multiple-optimizers-and-schedulers, except that the callable type doesn't accept any arguments. So have a look there in the docs to learn how to do it.

Also note that for the default, nn.LeakyReLU(negative_slope=0.01) would be wrong because that is an instance an should be a callable that retuns an instance. If you want a default with a modified parameter, then use a lambda function, also explained in the docs.

tommycwh commented 8 months ago

I am sorry that there is still something that I cannot figure out with the LeakyReLU with default arguments example. I try to translate the optimizer example to LeakyReLU like below.

In a python script:

# optimizer
opt_type = lambda p: torch.optim.SGD(p, lr=0.01)
opt_object = opt_type(p)

# leakyrelu
act_type = lambda *_: nn.LeakyReLU(negative_slope=0.2)
act_object = act_type()

I can create new objects with obj_type and act_type in the same way.

However, when I tried to apply this to LightningCLI, with the config below, I got an error.

In a nn.Module/L.LightningModule

# optimizer
OptimizerCallable = Callable[[Iterable], Optimizer] # from src/lightning/pytorch/cli.py
...
optimizer: OptimizerCallable = torch.optim.Adam, # arg of __init__
# or 
optimizer: OptimizerCallable = lambda p: torch.optim.SGD(p, lr=0.01),

# leakyrelu
ModuleCallable = Callable[..., nn.Module]
...
activation_layer: ModuleCallable = lambda *_: nn.LeakyReLU, # arg of __init__
# or
activation_layer: ModuleCallable = lambda *_ : nn.LeakyReLU(negative_slope=0.01),

In a config file:

model:
  class_path: boring_model.BoringModel
  init_args:
    model:
      class_path: boring_model.BoringNN
      init_args:
        activation_layer: 
          class_path: torch.nn.LeakyReLU
          init_args: 
            negative_slope: 0.2

    optimizer: 
      class_path: torch.optim.Adam
      init_args:
        lr: 0.01

The optimizer part works ok but the leakyrelu part does not and gives the error below.

Error:

error: Parser key "model":
  Problem with given class_path 'boring_model.BoringModel':
    Parser key "model":
      Problem with given class_path 'boring_model.BoringNN':
        Parser key "activation_layer":
          Type typing.Callable[..., torch.nn.modules.module.Module] expects a function or a callable class: Validation failed: No action for key "negative_slope" to check its value.. Got value: Namespace(class_path='torch.nn.LeakyReLU', init_args=Namespace(inplace=False, negative_slope=0.2))

May I ask what I am misunderstanding here?

mauvilsa commented 8 months ago

@tommycwh sorry, my mistake. To make the init_args configurable you must use as type Callable[[], nn.Module]. That is, a callable that receives zero arguments and returns an instance of Module. In contrast to optimizers that the type is a callable that receives exactly one positional parameter. By giving ... instead of [], it would be a callable that receives a variable number of parameters. Not sure how it behaves with ... since the feature was not designed for this. Why zero arguments you might ask? Because this is dependency injection. Parameters are not supposed to be given inside the __init__, i.e. the dependency comes from outside.

Likewise the default should be a lambda without parameters, like lambda: nn.LeakyReLU(negative_slope=0.2).

A complete working example is:

from typing import Callable

from lightning.pytorch.cli import LightningCLI, OptimizerCallable
from lightning.pytorch.demos.boring_classes import BoringModel

from torch.nn import Module, LeakyReLU
from torch.optim import SGD

ModuleCallable = Callable[[], Module]

class BoringNN(Module):
    def __init__(
        self,
        activation_layer: ModuleCallable = lambda: LeakyReLU(negative_slope=0.01),
    ):
        pass

class BoringSystem(BoringModel):
    def __init__(
        self,
        model: ModuleCallable = BoringNN,
        optimizer: OptimizerCallable = lambda p: SGD(p, lr=0.01),
    ):
        pass

LightningCLI(BoringSystem, subclass_mode_model=True, auto_configure_optimizers=False)

If you run cli.py fit --model=BoringSystem --print_config the output is:

...
model:
  class_path: __main__.BoringSystem
  init_args:
    model: __main__.BoringNN
    optimizer:
      class_path: torch.optim.SGD
      init_args:
        lr: 0.01
        momentum: 0.0
        dampening: 0.0
        weight_decay: 0.0
        nesterov: false
        maximize: false
        foreach: null
        differentiable: false

For and input config as:

model:
  class_path: __main__.BoringSystem
  init_args:
    model:
      class_path: __main__.BoringNN
      init_args:
        activation_layer:
          class_path: torch.nn.LeakyReLU
          init_args:
            negative_slope: 0.2
    optimizer:
      class_path: torch.optim.Adam
      init_args:
        lr: 0.01

the output of cli.py fit --config=config.yaml --print_config is:

...
model:
  class_path: __main__.BoringSystem
  init_args:
    model:
      class_path: __main__.BoringNN
      init_args:
        activation_layer:
          class_path: torch.nn.LeakyReLU
          init_args:
            negative_slope: 0.2
            inplace: false
    optimizer:
      class_path: torch.optim.Adam
      init_args:
        lr: 0.01
        betas:
        - 0.9
        - 0.999
        eps: 1.0e-08
        weight_decay: 0.0
        amsgrad: false
        foreach: null
        maximize: false
        capturable: false
        differentiable: false
        fused: null
tommycwh commented 8 months ago

It seems that activation_layer from activation_layer: ModuleCallable = lambda: LeakyReLU(negative_slope=0.01) is an instance of LeakyReLU already?

With slight modifications to your example,

class BoringNN(Module):
    def __init__(
        self,
        activation_layer: ModuleCallable = lambda: LeakyReLU(negative_slope=0.01),
    ):
        act = activation_layer() # <--- added this line
        pass

class BoringSystem(BoringModel):
    def __init__(
        self,
        model: ModuleCallable = BoringNN,
        optimizer: OptimizerCallable = lambda p: SGD(p, lr=0.01),
    ):
        super().__init__() # <--- added this line
        pass

I got this error:

Traceback (most recent call last):
  File "/userhome/35/whchan/workspace/pl-dev/debug/recursive_init/mauvilsa/main.py", line 28, in <module>
    LightningCLI(BoringSystem, subclass_mode_model=True, auto_configure_optimizers=False)
  File "/userhome/35/whchan/anaconda3/envs/pl-dev/lib/python3.10/site-packages/lightning/pytorch/cli.py", line 385, in __init__
    self.instantiate_classes()
  File "/userhome/35/whchan/anaconda3/envs/pl-dev/lib/python3.10/site-packages/lightning/pytorch/cli.py", line 535, in instantiate_classes
    self.config_init = self.parser.instantiate_classes(self.config)
  File "/userhome/35/whchan/anaconda3/envs/pl-dev/lib/python3.10/site-packages/jsonargparse/_deprecated.py", line 141, in patched_instantiate_classes
    cfg = self._unpatched_instantiate_classes(cfg, **kwargs)
  File "/userhome/35/whchan/anaconda3/envs/pl-dev/lib/python3.10/site-packages/jsonargparse/_core.py", line 1181, in instantiate_classes
    cfg[subcommand] = subparser.instantiate_classes(cfg[subcommand], instantiate_groups=instantiate_groups)
  File "/userhome/35/whchan/anaconda3/envs/pl-dev/lib/python3.10/site-packages/jsonargparse/_deprecated.py", line 141, in patched_instantiate_classes
    cfg = self._unpatched_instantiate_classes(cfg, **kwargs)
  File "/userhome/35/whchan/anaconda3/envs/pl-dev/lib/python3.10/site-packages/jsonargparse/_core.py", line 1172, in instantiate_classes
    parent[key] = component.instantiate_classes(value)
  File "/userhome/35/whchan/anaconda3/envs/pl-dev/lib/python3.10/site-packages/jsonargparse/_typehints.py", line 532, in instantiate_classes
    value[num] = adapt_typehints(
  File "/userhome/35/whchan/anaconda3/envs/pl-dev/lib/python3.10/site-packages/jsonargparse/_typehints.py", line 898, in adapt_typehints
    val = adapt_class_type(val, serialize, instantiate_classes, sub_add_kwargs, prev_val=prev_val)
  File "/userhome/35/whchan/anaconda3/envs/pl-dev/lib/python3.10/site-packages/jsonargparse/_typehints.py", line 1102, in adapt_class_type
    init_args = parser.instantiate_classes(init_args)
  File "/userhome/35/whchan/anaconda3/envs/pl-dev/lib/python3.10/site-packages/jsonargparse/_deprecated.py", line 141, in patched_instantiate_classes
    cfg = self._unpatched_instantiate_classes(cfg, **kwargs)
  File "/userhome/35/whchan/anaconda3/envs/pl-dev/lib/python3.10/site-packages/jsonargparse/_core.py", line 1172, in instantiate_classes
    parent[key] = component.instantiate_classes(value)
  File "/userhome/35/whchan/anaconda3/envs/pl-dev/lib/python3.10/site-packages/jsonargparse/_typehints.py", line 532, in instantiate_classes
    value[num] = adapt_typehints(
  File "/userhome/35/whchan/anaconda3/envs/pl-dev/lib/python3.10/site-packages/jsonargparse/_typehints.py", line 845, in adapt_typehints
    val = adapt_class_type(val, False, instantiate_classes, sub_add_kwargs, skip_args=num_partial_args)
  File "/userhome/35/whchan/anaconda3/envs/pl-dev/lib/python3.10/site-packages/jsonargparse/_typehints.py", line 1116, in adapt_class_type
    return instantiator_fn(val_class, **{**init_args, **dict_kwargs})
  File "/userhome/35/whchan/anaconda3/envs/pl-dev/lib/python3.10/site-packages/jsonargparse/_common.py", line 128, in default_class_instantiator
    return class_type(*args, **kwargs)
  File "/userhome/35/whchan/workspace/pl-dev/debug/recursive_init/mauvilsa/main.py", line 16, in __init__
    act = activation_layer()
  File "/userhome/35/whchan/anaconda3/envs/pl-dev/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
TypeError: LeakyReLU.forward() missing 1 required positional argument: 'input'

I think this means that activation_layer is an LeakyReLU instance, so that it requires a tensor as input, like x = activation_layer(x) for some tensor x.

But why does it seems different from the optimizer example? With

optimizer: OptimizerCallable = lambda p: torch.optim.SGD(p, lr=0.01), # from __init__ arguments
opt_object = optimizer(p)  

it seems that an optimizer object is created in the second line, when optimizer is called with parameter p as input. However, if I try to create an LeakyReLU object with act_object = activation_layer() with no input, it does not work because activation_layer is already the object but not the callable to create the object.

While in this LeakyReLU example, one may just use this activation_layer object in multiple layers of a network, since LeakyReLU does not store any state with it. However, I can think of other nn.Module's, like BatchNorm that requires creating new instances when used multiple times, due to the learnable parameters in the ``Module```.

mauvilsa commented 8 months ago

It seems that activation_layer from activation_layer: ModuleCallable = lambda: LeakyReLU(negative_slope=0.01) is an instance of LeakyReLU already?

You are right. That is a bug. It it is already an instance and it should only become an instance after calling just like you did. I need to look into it.

While in this LeakyReLU example, one may just use this activation_layer object in multiple layers of a network, since LeakyReLU does not store any state with it. However, I can think of other nn.Module's, like BatchNorm that requires creating new instances when used multiple times, due to the learnable parameters in the ``Module```.

Another benefit of getting a callable that creates an instance, is that as many instances as needed can be created. That is part of the this feature. I will fix the bug as soon as possible.

mauvilsa commented 7 months ago

@tommycwh a fix for the callable with zero arguments is now in https://github.com/omni-us/jsonargparse/pull/483. Please try it out from that branch.

tommycwh commented 7 months ago

@mauvilsa Thank you very much for your update. It seems working now. As a reference, I am including my final code below, which I thinks summarizes the several situations we have discussed above.

boring_model.py

from typing import Callable

import torch
import torch.nn as nn
import lightning as L
from lightning.pytorch.cli import OptimizerCallable, LRSchedulerCallable

# for callable with no argument
ActivationCallable = Callable[[], nn.Module] 
# for callable with one positional argument
NormCallable = Callable[..., nn.Module] 
class BoringNN(nn.Module) :
    def __init__(
        self, 
        # lambda function with no argument, but about a function taking an optional argument with a non-default value
        activation_layer: ActivationCallable = lambda: nn.LeakyReLU(negative_slope=0.01), 
        # lambda function with one argument
        norm_layer: NormCallable = lambda c: nn.BatchNorm1d(c), 
    ):
        super().__init__()

        self.layer = nn.Linear(3,32)
        self.norm = norm_layer(32) # create norm layer with num_channels
        self.activation = activation_layer() # create act layer no argument

    def forward(self, x):
        x = self.layer(x)
        x = self.norm(x)
        x = self.activation(x)
        return x

class BoringModel(L.LightningModule) :
    def __init__(
        self, 
        # gives an actual nn.Module object
        model: nn.Module, 
        # a callable to create an optimizer object
        optimizer: OptimizerCallable = lambda p: torch.optim.SGD(p, lr=0.01), 
        scheduler: LRSchedulerCallable = torch.optim.lr_scheduler.ConstantLR,
    ):
        super().__init__()
        self.save_hyperparameters()

        self.model = model # an actual nn.Module object
        self.optimizer = optimizer # a callable thats create an optimizer object
        self.scheduler = scheduler

    def configure_optimizers(self):
        optimizer = self.optimizer(self.parameters())
        scheduler = self.scheduler(optimizer)
        return {"optimizer": optimizer, "lr_scheduler": scheduler}

    def forward(self, x):
        pass

    def training_step(self, batch, batch_idx):
        pass

    def validation_step(self, batch, batch_idx):
        pass

    def test_step(self, batch, batch_idx):
        pass

boring_model.yaml

model:
  class_path: boring_model.BoringModel
  init_args:
    model:
      class_path: boring_model.BoringNN
      init_args:
        activation_layer: 
          class_path: torch.nn.LeakyReLU
          init_args: 
            negative_slope: 0.2 # different from default in boring_model.py
        norm_layer: 
          class_path: torch.nn.InstanceNorm1d # different from default in boring_model.py
          init_args:
            eps: 5e-05
    optimizer: 
      class_path: torch.optim.Adam
      init_args:
        lr: 0.01
    scheduler:
      class_path: torch.optim.lr_scheduler.ConstantLR

data:
  class_path: lightning.pytorch.demos.boring_classes.BoringDataModule

trainer:
  accelerator: auto

My command:

python3 main.py fit \
--config boring_model.yaml \
--trainer.accelerator cpu \
--trainer.devices 1
mauvilsa commented 7 months ago

One small change I would suggest is to be more explicit with NormCallable receiving one positional parameter as:

NormCallable = Callable[[int], nn.Module]
tommycwh commented 7 months ago

Yes, it is clearer this way. Thanks for the suggestion.