Closed kevin-Abbring closed 2 months ago
param_groups = optim_factory.add_weight_decay(model_without_ddp, args.weight_decay) optimizer = torch.optim.AdamW(param_groups, lr=args.lr, betas=(0.9, 0.95)) print(optimizer) loss_scaler = NativeScaler()
其中add_weight_dacay函数没了,改为 optim_factory.param_groups_weight_decay即可
following timm: set wd as 0 for bias and norm layers
其中add_weight_dacay函数没了,改为 optim_factory.param_groups_weight_decay即可