Optimizers without `weight_decay` produce errors

Hi there,

I am trying switching optimizer in the following example in the README

python train.py task=nlp/summarization dataset=nlp/summarization/xsum trainer.gpus=1

python train.py task=nlp/summarization dataset=nlp/summarization/xsum trainer.gpus=1 optimizer=adamax

and I get the error that the weight_decay configuration is not present. I think the source of the error is

https://github.com/PyTorchLightning/lightning-transformers/blob/aa8f48addc9d16733cfb7572cf19cba17cad29a6/lightning_transformers/core/instantiator.py#L81

As a fix, I did the following locally

 def optimizer(self, model: torch.nn.Module, cfg: DictConfig) -> torch.optim.Optimizer:
        if "weight_decay" in cfg:
            no_decay = ["bias", "LayerNorm.weight"]
            grouped_parameters = [
                {
                    "params": [
                        p for n, p in model.named_parameters() if not any(nd in n for nd in no_decay) and p.requires_grad
                    ],
                    "weight_decay": cfg.weight_decay,
                },
                {
                    "params": [
                        p for n, p in model.named_parameters() if any(nd in n for nd in no_decay) and p.requires_grad
                    ],
                    "weight_decay": 0.0,
                },
            ]
            return self.instantiate(cfg, grouped_parameters)

        return self.instantiate(cfg, filter(lambda p: p.requires_grad, model.parameters()))

Lightning-Universe / lightning-transformers

Optimizers without `weight_decay` produce errors #231