davidtvs / pytorch-lr-finder

A learning rate range test implementation in PyTorch
MIT License
921 stars 120 forks source link

'dict' object has no attribute 'param_groups' #81

Closed manza-ari closed 9 months ago

manza-ari commented 2 years ago

I am facing the following error, any suggestions?

py3.8.egg/torch_lr_finder/lr_finder.py", line 361, in _check_for_scheduler AttributeError: 'dict' object has no attribute 'param_groups'

The code is a simple one

        lr_finder = LRFinder(models, optimizers, criterion, device="cuda")
        lr_finder.range_test(train_loader, end_lr=100, num_iter=100, step_mode='exp')
        lr_finder.plot(log_lr=False) # to inspect the loss-learning rate graph
        lr_finder.reset()
NaleRaphael commented 2 years ago

Hi, @manza-ari

Can you provide the settings of your optimizers?

It seems likely you are using multiple optimizers, but I need further information to confirm and figure out possible cause of the issue.

If possible, providing a minimal and complete code snippet to reproduce this issue would be better, thanks!

manza-ari commented 2 years ago

yeah you are right, actually, there are different models and each model has a different network and different networks are using different optimizers. I am sorry it is difficult for me to share minimal code. Can you please check the following link

https://github.com/razvancaramalau/Sequential-GCN-for-Active-Learning/blob/master/main.py

I trying first on lloss method which has following optimizers

if method == 'lloss': optim_module = optim.SGD(models['module'].parameters(), lr=LR, momentum=MOMENTUM, weight_decay=WDECAY) sched_module = lr_scheduler.MultiStepLR(optim_module, milestones=MILESTONES) optimizers = {'backbone': optim_backbone, 'module': optim_module} schedulers = {'backbone': sched_backbone, 'module': sched_module}

NaleRaphael commented 2 years ago

Thanks for the information!

Since the implementation of that repository is a bit complex and is not written like a usual training pipeline, it's not easy to run LRFinder without modification. So I directly modified the file main.py in a forked repo, you can check out the difference here: https://github.com/NaleRaphael/Sequential-GCN-for-Active-Learning/commit/4b55065c#diff-b10564ab7d2c520cdd0243874879fb0a782862c3c902ab535faabe57d5a505e1

In the main.py, I assume that you are also going to run LRFinder.range_test() before running the actual train() function for GCN. The code related to the invocation of LRFinder starting after the comment # ----- Code related to LRFinder ----.

And the main concept of this modification is to create wrappers for models, loss function and optimizers. This should prevent you making too much changes to the code you are interested in (GCN part). You can also check out the comment in that file for further details.

You can clone that repo and use the same command $ python main.py -m lloss -d cifar10 -c 5 to check whether the result is what you want. If you have any further questions, please feel free to let us know!

manza-ari commented 2 years ago

Thank you so much for trying with code. After I tried your shared code it is giving me the following error RuntimeError: Optimizer already has a scheduler attached to it I am a little confused about this section from the following comments, Loss, criterion and scheduler (re)initialization ----- Code related to LRFinder ----

I copied it like this ` criterion = nn.CrossEntropyLoss(reduction='none') optim_backbone = optim.SGD(models['backbone'].parameters(), lr=LR, momentum=MOMENTUM, weight_decay=WDECAY)

        sched_backbone = lr_scheduler.MultiStepLR(optim_backbone, milestones=MILESTONES)
        optimizers = {'backbone': optim_backbone}
        schedulers = {'backbone': sched_backbone}
        if method == 'lloss':
            optim_module   = optim.SGD(models['module'].parameters(), lr=LR, 
                momentum=MOMENTUM, weight_decay=WDECAY)
            sched_module   = lr_scheduler.MultiStepLR(optim_module, milestones=MILESTONES)
            optimizers = {'backbone': optim_backbone, 'module': optim_module}

            schedulers = {'backbone': sched_backbone, 'module': sched_module} 

        #------------------------------------------------------------------------------------------------
        # ----- Code related to LRFinder ----
        model_wrapper = ModelWrapper(models, method)
        loss_wrapper = LossWrapper(criterion, models, method)
        optimizer_wrapper = OptimizerWrapper(optimizers)

        # Manually create an axis and pass it into `LRFinder.plot()` to avoid popping window
        # of figure blocking the procedure.
        fig, ax = plt.subplots()

        lr_finder = LRFinder(model_wrapper, optimizer_wrapper, loss_wrapper, device='cuda')
        lr_finder.range_test(train_loader, end_lr=1, num_iter=100)
        ax, suggested_lr = lr_finder.plot(ax=ax, skip_start=0, skip_end=0, suggest_lr=True)

        # Uncomment this to save the result figure of range test to file
        # fig.savefig('lr_loss_history.png')

        # Remember to reset model and optimizer to original state
        lr_finder.reset()

        # Set suggested LR
        for name in optimizers:
            optimizers[name].param_groups[0]['lr'] = suggested_lr

        print('----- Updated optimizers -----')
        print(optimizers)

        # ^^^^^ Code related to LRFinder ^^^^^

        # Attach LR scheduler

        sched_backbone = lr_scheduler.MultiStepLR(optim_backbone, milestones=MILESTONES)
        schedulers = {'backbone': sched_backbone}

        if method == 'lloss':
            sched_module   = lr_scheduler.MultiStepLR(optim_module, milestones=MILESTONES)
            schedulers = {'backbone': sched_backbone, 'module': sched_module}
        #------------------------------------------------------------------------------------------------            
        # Training and testing`
NaleRaphael commented 2 years ago

Oh, the error message RuntimeError: Optimizer already has a scheduler attached to it is raised because LRFinder requires to be run before users attach the actual scheduler they want to use.

This limitation is resulted by the mechanism of how learning rate scheduler is implemented in PyTorch. You can check out our previous discussion regarding this topic here.

So, to get LRFinder work with that GCN training script, you have to move any code related to scheduler after the section for running LRFinder. That's why I separated this part of code into these 2 sections (code for setting up optimizer and setting up LR scheduler).

davidtvs commented 9 months ago

Closing due to inactivity