Issue with Loading Pre-trained Weights for Fine-tuning

ButoneDream commented 6 months ago

Hello,

I hope this message finds you well. I am a senior student deeply interested in your work and currently attempting to leverage your published model for my academic project.

While trying to load the pre-trained weights for fine-tuning on my dataset, I encountered an error, which I am struggling to resolve. I have attached a screenshot to illustrate the issue more clearly.

The process I followed is based on the instructions provided in your documentation, aiming to load the pre-trained weights and then fine-tune the model on my data. However, upon execution, I encountered the following error:

I would greatly appreciate it if you could take a moment to look into this matter and provide any guidance or suggestions that might help me resolve this issue.

Thank you very much for your time and assistance. Your work is highly inspiring, and I am eager to apply it to my project successfully.

YanzuoLu commented 6 months ago

The load_state function in our training script is used for resuming in case of accident shutting down by machine or something else. Therefore it will need to restore the optimizer and random number generator state as your traceback said. The optimizer states are so large in file size : ( so we have deleted them all, and we only provide the model checkpoints in Google Drive. If you want to finetune it on your own dataset, I think you can directly use torch.load to get the corresponding state dict and use load_state_dict to load it into the U-Net and Swin Transformer respectively instead of specifying _cfg.MODEL.PRETRAINEDPATH. Optimizer state should be unnecessary for fine-tuning on a new dataset. Hope this helps. Thanks for your attention to our work!

ButoneDream commented 6 months ago

thanks ! I would like to know what these two files are, respectively?

YanzuoLu commented 6 months ago

You can probably check the file size. The larger one corresponds to the U-Net while the other corresponds to the Swin Transformer.

ButoneDream commented 6 months ago

thanks! but i modify following your instruction

    model = build_model(cfg) #SWIN TransFormer
    unet = UNet(cfg)
    swin_transformer_weight_path =  "checkpoints/pytorch_model_tf.bin"
    model.load_state_dict(torch.load(swin_transformer_weight_path, map_location='cpu'), strict=False)
    unet_weight_path =  "checkpoints/pytorch_model_unet.bin"
    unet.load_state_dict(torch.load(unet_weight_path, map_location='cpu'), strict=False)

    metric = build_metric().to(accelerator.device)
    trainable_params = sum([p.numel() for p in model.parameters() if p.requires_grad] + \
                           [p.numel() for p in unet.parameters() if p.requires_grad])
    logger.info(f"number of trainable parameters: {trainable_params}")

    logger.info("preparing optimizer...")

i met this error :

ButoneDream commented 6 months ago

class LinearWarmupMultiStepDecayLRScheduler(torch.optim.lr_scheduler._LRScheduler):
    def __init__(self, optimizer, warmup_steps, warmup_rate, decay_rate,
                 num_epochs, decay_epochs, iters_per_epoch, override_lr=0.,
                 last_epoch=-1, verbose=False):
        self.warmup_steps = warmup_steps
        self.warmup_rate = warmup_rate
        self.decay_rate = decay_rate
        self.decay_epochs = [decay_epoch * iters_per_epoch for decay_epoch in decay_epochs]
        self.num_epochs = num_epochs * iters_per_epoch
        self.override_lr = override_lr
        # super(LinearWarmupMultiStepDecayLRScheduler, self).__init__(optimizer, last_epoch, verbose)

        # 确保 optimizer 的每个 param_group 都有 'initial_lr'
        for param_group in optimizer.param_groups:
            if 'initial_lr' not in param_group:
                param_group['initial_lr'] = param_group['lr']

        super(LinearWarmupMultiStepDecayLRScheduler, self).__init__(optimizer, last_epoch, verbose)

i solved it above

YanzuoLu commented 6 months ago

class LinearWarmupMultiStepDecayLRScheduler(torch.optim.lr_scheduler._LRScheduler):
    def __init__(self, optimizer, warmup_steps, warmup_rate, decay_rate,
                 num_epochs, decay_epochs, iters_per_epoch, override_lr=0.,
                 last_epoch=-1, verbose=False):
        self.warmup_steps = warmup_steps
        self.warmup_rate = warmup_rate
        self.decay_rate = decay_rate
        self.decay_epochs = [decay_epoch * iters_per_epoch for decay_epoch in decay_epochs]
        self.num_epochs = num_epochs * iters_per_epoch
        self.override_lr = override_lr
        # super(LinearWarmupMultiStepDecayLRScheduler, self).__init__(optimizer, last_epoch, verbose)

        # 确保 optimizer 的每个 param_group 都有 'initial_lr'
        for param_group in optimizer.param_groups:
            if 'initial_lr' not in param_group:
                param_group['initial_lr'] = param_group['lr']

        super(LinearWarmupMultiStepDecayLRScheduler, self).__init__(optimizer, last_epoch, verbose)

i solved it above

👍

justinday123 commented 2 months ago

@ButoneDream Sorry for asking :( ...Can you share your finetuning code..?

YanzuoLu / CFLD

Issue with Loading Pre-trained Weights for Fine-tuning #8