Working my way through the code I ran into this breaking error:
when running this code
# Train the model for three epochs
for epoch in range(num_epochs):
# train for one epoch, printing every iteration
train_one_epoch(model, optimizer, data_loader_train, device, epoch, print_freq=10)
# update the learning rate
lr_scheduler.step()
# evaluate on the test dataset
evaluate(model, data_loader_val, device=device)
checkpoint_path = f'trained_model_{epoch+1}_epochs.pth'
torch.save(model.state_dict(), checkpoint_path)
An exception has occurred, use %tb to see the full traceback.
SystemExit: 1
/home/q/anaconda3/envs/xview3/lib/python3.9/site-packages/IPython/core/interactiveshell.py:3556: UserWarning: To exit: use 'exit', 'quit', or Ctrl-D.
warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)
Working my way through the code I ran into this breaking error:
when running this code
It produces the following error:
Epoch: [0] [ 0/65] eta: 0:01:55 lr: 0.000125 loss: 2.9749 (2.9749) loss_classifier: 1.1639 (1.1639) loss_box_reg: 0.0148 (0.0148) loss_objectness: 1.5950 (1.5950) loss_rpn_box_reg: 0.2011 (0.2011) time: 1.7780 data: 0.4853 max mem: 5038 Loss is nan, stopping training {'loss_classifier': tensor(1.3285, device='cuda:0', grad_fn=), 'loss_box_reg': tensor(0.0082, device='cuda:0', grad_fn=), 'loss_objectness': tensor(nan, device='cuda:0', grad_fn=), 'loss_rpn_box_reg': tensor(0.1605, device='cuda:0', grad_fn=)}
An exception has occurred, use %tb to see the full traceback.
SystemExit: 1
/home/q/anaconda3/envs/xview3/lib/python3.9/site-packages/IPython/core/interactiveshell.py:3556: UserWarning: To exit: use 'exit', 'quit', or Ctrl-D. warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)