Open JuanCab opened 2 years ago
I've been working my way through the Jupyter Notebook for Chapter 8.
When I run the cell that trains using L2 regularization
model = Net().to(device=device) optimizer = optim.SGD(model.parameters(), lr=1e-2) loss_fn = nn.CrossEntropyLoss() training_loop_l2reg( n_epochs = 100, optimizer = optimizer, model = model, loss_fn = loss_fn, train_loader = train_loader, ) all_acc_dict["l2 reg"] = validate(model, train_loader, val_loader)
The network will not train since the loss is 'nan'. I am curious if there is an error in the definition of training_loop_l2reg in the previous cell:
training_loop_l2reg
def training_loop_l2reg(n_epochs, optimizer, model, loss_fn, train_loader): for epoch in range(1, n_epochs + 1): loss_train = 0.0 for imgs, labels in train_loader: imgs = imgs.to(device=device) labels = labels.to(device=device) outputs = model(imgs) loss = loss_fn(outputs, labels) l2_lambda = 0.001 # Replace pow(2.0) with abs() for L1 regularization l2_norm = sum(p.pow(2.0).sum() for p in model.parameters()) loss = loss + l2_lambda * l2_norm optimizer.zero_grad() loss.backward() optimizer.step() loss_train += loss.item() if epoch == 1 or epoch % 10 == 0: print('{} Epoch {}, Training loss {}'.format( datetime.datetime.now(), epoch, loss_train / len(train_loader)))
Since if I instead train using the weight_decay parameter in SGD instead:
model = NetWidth(n_chans1=32).to(device=device) optimizer = optim.SGD(model.parameters(), weight_decay=0.001, lr=1e-2) loss_fn = nn.CrossEntropyLoss() training_loop( n_epochs = 100, optimizer = optimizer, model = model, loss_fn = loss_fn, train_loader = train_loader, ) all_acc_dict["width"] = validate(model, train_loader, val_loader)
I have no problem with the loss converging.
I've been working my way through the Jupyter Notebook for Chapter 8.
When I run the cell that trains using L2 regularization
The network will not train since the loss is 'nan'. I am curious if there is an error in the definition of
training_loop_l2reg
in the previous cell:Since if I instead train using the weight_decay parameter in SGD instead:
I have no problem with the loss converging.