Closed SreenijaK closed 5 years ago
i was able to rectify the issue, incase the length of train samples is over 26 i was getting the above issue.
I Its reached till 1000, but has not saved any model in expr, do you by chance know the reason?
I havent changed anythin in params.py, But i still dont understand the reason : heres snipped of my params.py nc = 1 pretrained = '' # path to pretrained model (to continue training) expr_dir = 'expr' # where to store samples and models dealwith_lossnone = True # whether to replace all nan/inf in gradients to zero
cuda = True # enables cuda multi_gpu = False # whether to use multi gpu ngpu = 1 # number of GPUs to use. Do remember to set multi_gpu to True! workers = 0 # number of data loading workers
displayInterval = 100 # interval to be print the train loss valInterval = 1000 # interval to val the model loss and accuray saveInterval = 1000 # interval to save model n_test_disp = 10 # number of samples to display when val the model
Hi, sorry to reply now. You can check the code
# do checkpointing
if i % params.saveInterval == 0:
torch.save(crnn.state_dict(), '{0}/netCRNN_{1}_{2}.pth'.format(params.expr_dir, epoch, i))
and i < len(train_loader)
and len(train_loader)
is related to batch_size
Of course, we need't make it so difficult, just change the params.saveInterval
to smaller is OK.
Thank you. its working now anyways. you've been a great help.
Hi, please try the latest code, and i think i have fixed it, thank you~~~
why is it that my loss is always like below: | [806/1000][400/410] Loss: inf 0|train | [807/1000][100/410] Loss: inf 0|train | [807/1000][200/410] Loss: inf 0|train | [807/1000][300/410] Loss: inf 0|train | [807/1000][400/410] Loss: inf 0|train | [808/1000][100/410] Loss: inf 0|train | [808/1000][200/410] Loss: inf 0|train | [808/1000][300/410] Loss: inf 0|train | [808/1000][400/410] Loss: inf 0|train | [809/1000][100/410] Loss: inf 0|train | [809/1000][200/410] Loss: inf 0|train | [809/1000][300/410] Loss: inf 0|train | [809/1000][400/410] Loss: inf 0|train | [810/1000][100/410] Loss: inf 0|train | [810/1000][200/410] Loss: inf