Closed MustafaAlahmid closed 3 years ago
Hi, I'm trying to train with my own dataset and im getting this error ::
ake: Entering directory '/media/HDD/pytorchOCR-master/ptocr/postprocess/dbprocess' make: 'cppdbprocess.so' is up to date. make: Leaving directory '/media/HDD/pytorchOCR-master/ptocr/postprocess/dbprocess' Resuming from checkpoint. Traceback (most recent call last): File "tools/det_train.py", line 307, in TrainValProgram(args) File "tools/det_train.py", line 221, in TrainValProgram assert os.path.isfile(config['base']['restore_file']), 'Error: no checkpoint file found!' AssertionError: Error: no checkpoint file found
If you are training from scratch, in yaml file, you need to set restore to False. If you are restoring the previous training, you need to set restore to True and set restore_file to be the model file you saved before
Hi, I'm trying to train with my own dataset and im getting this error :: ake: Entering directory '/media/HDD/pytorchOCR-master/ptocr/postprocess/dbprocess' make: 'cppdbprocess.so' is up to date. make: Leaving directory '/media/HDD/pytorchOCR-master/ptocr/postprocess/dbprocess' Resuming from checkpoint. Traceback (most recent call last): File "tools/det_train.py", line 307, in TrainValProgram(args) File "tools/det_train.py", line 221, in TrainValProgram assert os.path.isfile(config['base']['restore_file']), 'Error: no checkpoint file found!' AssertionError: Error: no checkpoint file found
If you are training from scratch, in yaml file, you need to set restore to False. If you are restoring the previous training, you need to set restore to True and set restore_file to be the model file you saved before
I just noticed While training this is the log: loss_thresh always 1.0000 ACC= 0.5000 is it okay ? here is part of the log
(41/1200/40/49) | loss_total:2.7081 | loss_l1:0.1146 | loss_bce:0.5623 | loss_thresh:1.0000 | ACC:0.5000 | IOU:0.4917 | lr:0.00193839 (42/1200/0/49) | loss_total:2.7006 | loss_l1:0.1138 | loss_bce:0.5623 | loss_thresh:1.0000 | ACC:0.5000 | IOU:0.4950 | lr:0.00193689 (42/1200/20/49) | loss_total:2.7065 | loss_l1:0.1144 | loss_bce:0.5623 | loss_thresh:1.0000 | ACC:0.5000 | IOU:0.4914 | lr:0.00193689 (42/1200/40/49) | loss_total:2.7077 | loss_l1:0.1145 | loss_bce:0.5624 | loss_thresh:1.0000 | ACC:0.5000 | IOU:0.4915 | lr:0.00193689 (43/1200/0/49) | loss_total:2.7064 | loss_l1:0.1144 | loss_bce:0.5623 | loss_thresh:1.0000 | ACC:0.5000 | IOU:0.4897 | lr:0.00193538 (43/1200/20/49) | loss_total:2.7041 | loss_l1:0.1142 | loss_bce:0.5624 | loss_thresh:1.0000 | ACC:0.5000 | IOU:0.4920 | lr:0.00193538 (43/1200/40/49) | loss_total:2.7071 | loss_l1:0.1145 | loss_bce:0.5624 | loss_thresh:1.0000 | ACC:0.5000 | IOU:0.4918 | lr:0.00193538 (44/1200/0/49) | loss_total:2.7009 | loss_l1:0.1139 | loss_bce:0.5623 | loss_thresh:1.0000 | ACC:0.5000 | IOU:0.4943 | lr:0.00193388 (44/1200/20/49) | loss_total:2.7083 | loss_l1:0.1146 | loss_bce:0.5627 | loss_thresh:1.0000 | ACC:0.5000 | IOU:0.4917 | lr:0.00193388 (44/1200/40/49) | loss_total:2.7086 | loss_l1:0.1146 | loss_bce:0.5625 | loss_thresh:1.0000 | ACC:0.5000 | IOU:0.4916 | lr:0.00193388 (45/1200/0/49) | loss_total:2.7083 | loss_l1:0.1146 | loss_bce:0.5623 | loss_thresh:1.0000 | ACC:0.5000 | IOU:0.4943 | lr:0.00193237 (45/1200/20/49) | loss_total:2.7074 | loss_l1:0.1145 | loss_bce:0.5623 | loss_thresh:1.0000 | ACC:0.5000 | IOU:0.4925 | lr:0.00193237 (45/1200/40/49) | loss_total:2.7069 | loss_l1:0.1145 | loss_bce:0.5623 | loss_thresh:1.0000 | ACC:0.5000 | IOU:0.4920 | lr:0.00193237 (46/1200/0/49) | loss_total:2.7077 | loss_l1:0.1145 | loss_bce:0.5623 | loss_thresh:1.0000 | ACC:0.5000 | IOU:0.4913 | lr:0.00193087 (46/1200/20/49) | loss_total:2.7062 | loss_l1:0.1144 | loss_bce:0.5623 | loss_thresh:1.0000 | ACC:0.5000 | IOU:0.4912 | lr:0.00193087
This is normal. You need to train more epoch. After a certain node, loss will drop rapidly, and acc and iou will increase rapidly
Hi, I'm trying to train with my own dataset and im getting this error ::
ake: Entering directory '/media/HDD/pytorchOCR-master/ptocr/postprocess/dbprocess' make: 'cppdbprocess.so' is up to date. make: Leaving directory '/media/HDD/pytorchOCR-master/ptocr/postprocess/dbprocess' Resuming from checkpoint. Traceback (most recent call last): File "tools/det_train.py", line 307, in
TrainValProgram(args)
File "tools/det_train.py", line 221, in TrainValProgram
assert os.path.isfile(config['base']['restore_file']), 'Error: no checkpoint file found!'
AssertionError: Error: no checkpoint file found