davidtvs / pytorch-lr-finder

A learning rate range test implementation in PyTorch
MIT License
911 stars 116 forks source link

ModuleDict.update should be called with an iterable of key/value pairs, but got ResNet #87

Open manza-ari opened 2 years ago

manza-ari commented 2 years ago

My Code is here I am facing the following error. I want to ask one more thing Does LR Finder doesn't work for DataParallel? I have commented on the scheduler so that LR-FInder could work properly.

`

Main

if name == 'main':

method = args.method_type
methods = ['Random']
datasets = ['cifar10', 'cifar100', 'fashionmnist','svhn']
assert method in methods, 'No method %s! Try options %s'%(method, methods)
assert args.dataset in datasets, 'No dataset %s! Try options %s'%(args.dataset, datasets)

results = open('results_'+str(args.method_type)+"_"+args.dataset +'_main'+str(args.cycles)+str(args.total)+'.txt','w')
print("Dataset: %s"%args.dataset)
print("Method type:%s"%method)

if args.total:
    TRIALS = 1
    CYCLES = 1
else:
    CYCLES = args.cycles

for trial in range(TRIALS):

    # Load training and testing dataset
    data_train, data_unlabeled, data_test, adden, NO_CLASSES, no_train = load_dataset(args.dataset)
    # Don't predefine budget size. Configure it in the config.py: ADDENDUM = adden
    NUM_TRAIN = no_train
    indices = list(range(NUM_TRAIN))
    random.shuffle(indices)

    if args.total:
        labeled_set= indices
    else:
        labeled_set = indices[:ADDENDUM]
        unlabeled_set = [x for x in indices if x not in labeled_set]

    train_loader = DataLoader(data_train, batch_size=BATCH, 
                                sampler=SubsetRandomSampler(labeled_set), 
                                pin_memory=True, drop_last=True)
    test_loader  = DataLoader(data_test, batch_size=BATCH)
    dataloaders  = {'train': train_loader, 'test': test_loader}

    for cycle in range(CYCLES):

        # Randomly sample 10000 unlabeled data points
        if not args.total:
            random.shuffle(unlabeled_set)
            subset = unlabeled_set[:SUBSET]

        # Model - create new instance for every cycle so that it resets
        with torch.cuda.device(CUDA_VISIBLE_DEVICES):
            args.dataset == "cifar100"
            resnet18    = resnet.resnet18(num_classes=NO_CLASSES).cuda()

        args.dataset == "cifar100"
        models = resnet18 

        torch.backends.cudnn.benchmark = True
        #models = torch.nn.DataParallel(models, device_ids=[0])

        # Loss, criterion and scheduler (re)initialization
        criterion      = nn.CrossEntropyLoss(reduction='none')
        optim_backbone = optim.SGD(models.parameters(), lr=LR, weight_decay=WDECAY) #, momentum=MOMENTUM

        #sched_backbone = lr_scheduler.MultiStepLR(optim_backbone, milestones=MILESTONES)  

        optimizers = optim_backbone
        #schedulers = {'backbone': sched_backbone}

        # Training and testing
        model_wrapper = ModelWrapper(models, method)
        loss_wrapper = LossWrapper(criterion, models, method)
        optimizer_wrapper = OptimizerWrapper(optimizers)

        # Manually create an axis and pass it into `LRFinder.plot()` to avoid popping window
        # of figure blocking the procedure.
        fig, ax = plt.subplots()

        lr_finder = LRFinder(model_wrapper, optimizer_wrapper, loss_wrapper, device='cuda')
        lr_finder.range_test(train_loader, end_lr=1, num_iter=100)
        ax, suggested_lr = lr_finder.plot(ax=ax, skip_start=0, skip_end=0, suggest_lr=True)

        lr_finder.reset() # to reset the model and optimizer to their initial state 

        for name in optimizers:
            optimizers[name].param_groups[0]['lr'] = suggested_lr

        print('----- Updated optimizers -----')
        print(optimizers)

        criterion = nn.CrossEntropyLoss(reduction='none') 

        # LR Finder

        train(models, method, criterion, optimizers, dataloaders, args.no_of_epochs, EPOCHL)  #schedulers,

        acc = test(models, EPOCH, method, dataloaders, mode='test')
        print('Trial {}/{} || Cycle {}/{} || Label set size {}: Test acc {}'.format(trial+1, TRIALS, cycle+1, CYCLES, len(labeled_set), acc))
        np.array([method, trial+1, TRIALS, cycle+1, CYCLES, len(labeled_set), acc]).tofile(results, sep=" ")
        results.write("\n")

        if cycle == (CYCLES-1):
            # Reached final training cycle
            print("Finished.")
            break
        # Get the indices of the unlabeled samples to train on next cycle
        arg = query_samples(models, method, data_unlabeled, subset, labeled_set, cycle, args)

        # Update the labeled dataset and the unlabeled dataset, respectively
        labeled_set += list(torch.tensor(subset)[arg][-ADDENDUM:].numpy())
        listd = list(torch.tensor(subset)[arg][:-ADDENDUM].numpy()) 
        unlabeled_set = listd + unlabeled_set[SUBSET:]
        print(len(labeled_set), min(labeled_set), max(labeled_set))
        # Create a new dataloader for the updated labeled dataset
        dataloaders['train'] = DataLoader(data_train, batch_size=BATCH, 
                                        sampler=SubsetRandomSampler(labeled_set), 
                                        pin_memory=True)
    results.close()

`

NaleRaphael commented 1 year ago

Hi @manza-ari , sorry that I missed the notification of this issue.

Regarding the error "ModuleDict.update should be called with an iterable of key/value pairs, but got ResNet", it seems not an error raised from LRFinder. If it's possible, could you post a complete traceback of the error message? That would be helpful for us to find out the actual location where this error is raised.

As for DataParallel, I just run a colab notebook (torch 1.12.1) with a model wrapped by DataParallel using single GPU, it currently works fine. If you can also provide further details regarding this problem, maybe we can figure out the cause.

By the way, it's suggested to run the model with a few samples without LRFinder first. If it works well, then you can continue working on applying LRFinder with the model. This could help you clarify whether it's a problem with LRFinder or other thing else, and it also help you figure out the actual problem faster.

Thanks!

manza-ari commented 1 year ago

Thank you so much for your kindness. I had asked you other than this question how to use LR-Finder for my project and you have edited this for me

https://github.com/NaleRaphael/Sequential-GCN-for-Active-Learning/commit/4b55065c#diff-b10564ab7d2c520cdd0243874879fb0a782862c3c902ab535faabe57d5a505e1

So the difference is I removed the dictionaries I was using earlier and currently, I simplified the program to for single method. So there is nothing known as the "method" variable in the program.

So this error is in the following place LRFinder

The code I will post below:

manza-ari commented 1 year ago

`class ModelWrapper(nn.Module): def init(self, models): super().init() self.models = nn.ModuleDict(models)

def forward(self, inputs):
    scores, _, features = self.models(inputs)
    return scores  

class LossWrapper(nn.Module): def init(self, loss_func, models): super().init() self.loss_func = loss_func

    self.models = models

def forward(self, inputs, labels):
    # unpack

    scores = inputs

    target_loss = criterion(scores, labels)

    m_backbone_loss = torch.sum(target_loss) / target_loss.size(0)
    loss            = m_backbone_loss
    return loss

class OptimizerWrapper(optim.Optimizer): def init(self, optimizer_dict): optim_args = [] for _optim in optimizer_dict.values(): optim_args.append(_optim.param_groups[0]) defaults = {k: v for k, v in optim_args[0].items() if k != 'params'}

    super().__init__(optim_args, defaults)

    self.optimizer_dict = optimizer_dict

def step(self, *args, **kwargs):
    for v in self.optimizer_dict.values():
        v.step(*args, **kwargs)

def zero_grad(self, *args, **kwargs):
    for v in self.optimizer_dict.values():
        v.zero_grad(*args, **kwargs)

if name == 'main':

method = args.method_type
methods = ['Random']
datasets = ['cifar10', 'cifar100', 'fashionmnist','svhn']
assert method in methods, 'No method %s! Try options %s'%(method, methods)
assert args.dataset in datasets, 'No dataset %s! Try options %s'%(args.dataset, datasets)

results = open('results_'+str(args.method_type)+"_"+args.dataset +'_main'+str(args.cycles)+str(args.total)+'.txt','w')
print("Dataset: %s"%args.dataset)
print("Method type:%s"%method)

if args.total:
    TRIALS = 1
    CYCLES = 1
else:
    CYCLES = args.cycles

for trial in range(TRIALS):

    # Load training and testing dataset
    data_train, data_unlabeled, data_test, adden, NO_CLASSES, no_train = load_dataset(args.dataset)
    # Don't predefine budget size. Configure it in the config.py: ADDENDUM = adden
    NUM_TRAIN = no_train
    indices = list(range(NUM_TRAIN))
    random.shuffle(indices)

    if args.total:
        labeled_set= indices
    else:
        labeled_set = indices[:ADDENDUM]
        unlabeled_set = [x for x in indices if x not in labeled_set]

    train_loader = DataLoader(data_train, batch_size=BATCH, 
                                sampler=SubsetRandomSampler(labeled_set), 
                                pin_memory=True, drop_last=True)
    test_loader  = DataLoader(data_test, batch_size=BATCH)
    dataloaders  = {'train': train_loader, 'test': test_loader}

    for cycle in range(CYCLES):

        # Randomly sample 10000 unlabeled data points
        if not args.total:
            random.shuffle(unlabeled_set)
            subset = unlabeled_set[:SUBSET]

        # Model - create new instance for every cycle so that it resets
        with torch.cuda.device(CUDA_VISIBLE_DEVICES):
            args.dataset == "cifar100"
            resnet18    = resnet.ResNet18(num_classes=NO_CLASSES).cuda()

        args.dataset == "cifar100"
        models = resnet18 

        torch.backends.cudnn.benchmark = True
        #models = torch.nn.DataParallel(models, device_ids=[0])

        # Loss, criterion and scheduler (re)initialization
        criterion      = nn.CrossEntropyLoss(reduction='none')
        optim_backbone = optim.SGD(models.parameters(), lr=LR, weight_decay=WDECAY) #, momentum=MOMENTUM

        #sched_backbone = lr_scheduler.MultiStepLR(optim_backbone, milestones=MILESTONES)  

        optimizers = optim_backbone
        #schedulers = {'backbone': sched_backbone}

        # Training and testing
        model_wrapper = ModelWrapper(models)
        loss_wrapper = LossWrapper(criterion, models)
        optimizer_wrapper = OptimizerWrapper(optimizers)

        # Manually create an axis and pass it into `LRFinder.plot()` to avoid popping window
        # of figure blocking the procedure.
        fig, ax = plt.subplots()

        lr_finder = LRFinder(model_wrapper, optimizer_wrapper, loss_wrapper, device='cuda')
        lr_finder.range_test(train_loader, end_lr=1, num_iter=100)
        ax, suggested_lr = lr_finder.plot(ax=ax, skip_start=0, skip_end=0, suggest_lr=True)

        lr_finder.reset() # to reset the model and optimizer to their initial state 

        for name in optimizers:
            optimizers[name].param_groups[0]['lr'] = suggested_lr

        print('----- Updated optimizers -----')
        print(optimizers)

        criterion = nn.CrossEntropyLoss(reduction='none') 

        # LR Finder

        train(models, method, criterion, optimizers, dataloaders, args.no_of_epochs, EPOCHL)  #schedulers,

        acc = test(models, EPOCH, method, dataloaders, mode='test')
        print('Trial {}/{} || Cycle {}/{} || Label set size {}: Test acc {}'.format(trial+1, TRIALS, cycle+1, CYCLES, len(labeled_set), acc))
        np.array([method, trial+1, TRIALS, cycle+1, CYCLES, len(labeled_set), acc]).tofile(results, sep=" ")
        results.write("\n")

        if cycle == (CYCLES-1):
            # Reached final training cycle
            print("Finished.")
            break
        # Get the indices of the unlabeled samples to train on next cycle
        arg = query_samples(models, method, data_unlabeled, subset, labeled_set, cycle, args)

        # Update the labeled dataset and the unlabeled dataset, respectively
        labeled_set += list(torch.tensor(subset)[arg][-ADDENDUM:].numpy())
        listd = list(torch.tensor(subset)[arg][:-ADDENDUM].numpy()) 
        unlabeled_set = listd + unlabeled_set[SUBSET:]
        print(len(labeled_set), min(labeled_set), max(labeled_set))
        # Create a new dataloader for the updated labeled dataset
        dataloaders['train'] = DataLoader(data_train, batch_size=BATCH, 
                                        sampler=SubsetRandomSampler(labeled_set), 
                                        pin_memory=True)
    results.close()`
NaleRaphael commented 1 year ago

Hi @manza-ari

In the original implementation, models is actually a dictionary like this:

# src: https://github.com/razvancaramalau/Sequential-GCN-for-Active-Learning/blob/master/main.py#L111-L113
models = {'backbone': resnet18}
if method =='lloss':
    models = {'backbone': resnet18, 'module': loss_module}

If you follow the same architecture, then that ModelWrapper should still work. But in the code snippet you posted, it seems models is declared as the following one instead:

models = resnet18 

Therefore, it raised the exact error message as it shown in the title of this issue:

ModuleDict.update should be called with an iterable of key/value pairs, but got ResNet

Maybe you can try to rewrite models as a dictionary like it shown in original implementation, then re-run it to see whether there is further issue.

manza-ari commented 1 year ago

okay, So I have implemented the dictionary again, but it started raising this error: AttributeError: Optimizer already has a scheduler attached to it I had removed the scheduler but still not working

I have changed the optimizer from SGD to Adam but now it says; raise TypeError("optimizer can only optimize Tensors, " TypeError: optimizer can only optimize Tensors, but one of the params is str

NaleRaphael commented 1 year ago

Hi @manza-ari Yes, LRFinder works with a scheduler internally, so that you need to detach your own scheduler before running with LRFinder. Regarding the other error you mentioned, I'm afraid that there are problems in your implementation of training pipeline.

In order to make it easier to get the code work, it's strongly suggested following these steps:

  1. Finish code for model and training pipeline first. Make sure the code can work well without LRFinder.
  2. Try to attach LRFinder according to the tutorial written in README.md. (this is the step I can help with if you have any question)

Otherwise, I would worry that there would be further problems confusing you while too many things are mixed up.