Lightning-Universe / lightning-transformers

Flexible components pairing 🤗 Transformers with :zap: Pytorch Lightning
https://lightning-transformers.readthedocs.io
Apache License 2.0
610 stars 77 forks source link

Optimizers without `weight_decay` produce errors #231

Closed pietrolesci closed 2 years ago

pietrolesci commented 2 years ago

Hi there,

I am trying switching optimizer in the following example in the README

python train.py task=nlp/summarization dataset=nlp/summarization/xsum trainer.gpus=1

to

python train.py task=nlp/summarization dataset=nlp/summarization/xsum trainer.gpus=1 optimizer=adamax

and I get the error that the weight_decay configuration is not present. I think the source of the error is

https://github.com/PyTorchLightning/lightning-transformers/blob/aa8f48addc9d16733cfb7572cf19cba17cad29a6/lightning_transformers/core/instantiator.py#L81

As a fix, I did the following locally

 def optimizer(self, model: torch.nn.Module, cfg: DictConfig) -> torch.optim.Optimizer:
        if "weight_decay" in cfg:
            no_decay = ["bias", "LayerNorm.weight"]
            grouped_parameters = [
                {
                    "params": [
                        p for n, p in model.named_parameters() if not any(nd in n for nd in no_decay) and p.requires_grad
                    ],
                    "weight_decay": cfg.weight_decay,
                },
                {
                    "params": [
                        p for n, p in model.named_parameters() if any(nd in n for nd in no_decay) and p.requires_grad
                    ],
                    "weight_decay": 0.0,
                },
            ]
            return self.instantiate(cfg, grouped_parameters)

        return self.instantiate(cfg, filter(lambda p: p.requires_grad, model.parameters()))
stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.