No checkpoint found!! - Githubissues

MustafaAlahmid commented 3 years ago

Hi, I'm trying to train with my own dataset and im getting this error ::

ake: Entering directory '/media/HDD/pytorchOCR-master/ptocr/postprocess/dbprocess' make: 'cppdbprocess.so' is up to date. make: Leaving directory '/media/HDD/pytorchOCR-master/ptocr/postprocess/dbprocess' Resuming from checkpoint. Traceback (most recent call last): File "tools/det_train.py", line 307, in TrainValProgram(args) File "tools/det_train.py", line 221, in TrainValProgram assert os.path.isfile(config['base']['restore_file']), 'Error: no checkpoint file found!' AssertionError: Error: no checkpoint file found

fxwfzsxyq commented 3 years ago

Hi, I'm trying to train with my own dataset and im getting this error ::

ake: Entering directory '/media/HDD/pytorchOCR-master/ptocr/postprocess/dbprocess' make: 'cppdbprocess.so' is up to date. make: Leaving directory '/media/HDD/pytorchOCR-master/ptocr/postprocess/dbprocess' Resuming from checkpoint. Traceback (most recent call last): File "tools/det_train.py", line 307, in TrainValProgram(args) File "tools/det_train.py", line 221, in TrainValProgram assert os.path.isfile(config['base']['restore_file']), 'Error: no checkpoint file found!' AssertionError: Error: no checkpoint file found

If you are training from scratch, in yaml file, you need to set restore to False. If you are restoring the previous training, you need to set restore to True and set restore_file to be the model file you saved before

MustafaAlahmid commented 3 years ago

Hi, I'm trying to train with my own dataset and im getting this error :: ake: Entering directory '/media/HDD/pytorchOCR-master/ptocr/postprocess/dbprocess' make: 'cppdbprocess.so' is up to date. make: Leaving directory '/media/HDD/pytorchOCR-master/ptocr/postprocess/dbprocess' Resuming from checkpoint. Traceback (most recent call last): File "tools/det_train.py", line 307, in TrainValProgram(args) File "tools/det_train.py", line 221, in TrainValProgram assert os.path.isfile(config['base']['restore_file']), 'Error: no checkpoint file found!' AssertionError: Error: no checkpoint file found

If you are training from scratch, in yaml file, you need to set restore to False. If you are restoring the previous training, you need to set restore to True and set restore_file to be the model file you saved before

Solved, thank you

I just noticed While training this is the log: loss_thresh always 1.0000 ACC= 0.5000 is it okay ? here is part of the log

(41/1200/40/49) | loss_total:2.7081 | loss_l1:0.1146 | loss_bce:0.5623 | loss_thresh:1.0000 | ACC:0.5000 | IOU:0.4917 | lr:0.00193839 (42/1200/0/49) | loss_total:2.7006 | loss_l1:0.1138 | loss_bce:0.5623 | loss_thresh:1.0000 | ACC:0.5000 | IOU:0.4950 | lr:0.00193689 (42/1200/20/49) | loss_total:2.7065 | loss_l1:0.1144 | loss_bce:0.5623 | loss_thresh:1.0000 | ACC:0.5000 | IOU:0.4914 | lr:0.00193689 (42/1200/40/49) | loss_total:2.7077 | loss_l1:0.1145 | loss_bce:0.5624 | loss_thresh:1.0000 | ACC:0.5000 | IOU:0.4915 | lr:0.00193689 (43/1200/0/49) | loss_total:2.7064 | loss_l1:0.1144 | loss_bce:0.5623 | loss_thresh:1.0000 | ACC:0.5000 | IOU:0.4897 | lr:0.00193538 (43/1200/20/49) | loss_total:2.7041 | loss_l1:0.1142 | loss_bce:0.5624 | loss_thresh:1.0000 | ACC:0.5000 | IOU:0.4920 | lr:0.00193538 (43/1200/40/49) | loss_total:2.7071 | loss_l1:0.1145 | loss_bce:0.5624 | loss_thresh:1.0000 | ACC:0.5000 | IOU:0.4918 | lr:0.00193538 (44/1200/0/49) | loss_total:2.7009 | loss_l1:0.1139 | loss_bce:0.5623 | loss_thresh:1.0000 | ACC:0.5000 | IOU:0.4943 | lr:0.00193388 (44/1200/20/49) | loss_total:2.7083 | loss_l1:0.1146 | loss_bce:0.5627 | loss_thresh:1.0000 | ACC:0.5000 | IOU:0.4917 | lr:0.00193388 (44/1200/40/49) | loss_total:2.7086 | loss_l1:0.1146 | loss_bce:0.5625 | loss_thresh:1.0000 | ACC:0.5000 | IOU:0.4916 | lr:0.00193388 (45/1200/0/49) | loss_total:2.7083 | loss_l1:0.1146 | loss_bce:0.5623 | loss_thresh:1.0000 | ACC:0.5000 | IOU:0.4943 | lr:0.00193237 (45/1200/20/49) | loss_total:2.7074 | loss_l1:0.1145 | loss_bce:0.5623 | loss_thresh:1.0000 | ACC:0.5000 | IOU:0.4925 | lr:0.00193237 (45/1200/40/49) | loss_total:2.7069 | loss_l1:0.1145 | loss_bce:0.5623 | loss_thresh:1.0000 | ACC:0.5000 | IOU:0.4920 | lr:0.00193237 (46/1200/0/49) | loss_total:2.7077 | loss_l1:0.1145 | loss_bce:0.5623 | loss_thresh:1.0000 | ACC:0.5000 | IOU:0.4913 | lr:0.00193087 (46/1200/20/49) | loss_total:2.7062 | loss_l1:0.1144 | loss_bce:0.5623 | loss_thresh:1.0000 | ACC:0.5000 | IOU:0.4912 | lr:0.00193087

BADBADBADBOY commented 3 years ago

This is normal. You need to train more epoch. After a certain node, loss will drop rapidly, and acc and iou will increase rapidly

BADBADBADBOY / pytorchOCR

No checkpoint found!! #11

Solved, thank you