PaddlePaddle / PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
https://paddlepaddle.github.io/PaddleOCR/
Apache License 2.0
43k stars 7.72k forks source link

训练识别网络crnn时loss为nan #10042

Closed justcodew closed 1 year ago

justcodew commented 1 year ago

训练识别模型 crnn 主干为mobilenet,l突然变为acc: 0.000000, norm_edit_dis: 0.000000, loss: nanxxx 如何解决 训练集有200万,验证集为2万。使用了先前已经训练好的一个模型作为预训练模型 训练日志如下:


LAUNCH INFO 2023-05-26 21:39:38,415 with_gloo: 1
LAUNCH INFO 2023-05-26 21:39:38,415 --------------------------------------------------
LAUNCH INFO 2023-05-26 21:39:38,416 Job: default, mode collective, replicas 1[1:1], elastic False
LAUNCH INFO 2023-05-26 21:39:38,429 Run Pod: ylcqrw, replicas 4, status ready
LAUNCH INFO 2023-05-26 21:39:38,517 Watching Pod: ylcqrw, replicas 4, status running
[2023/05/26 21:39:40] ppocr INFO: Architecture : 
[2023/05/26 21:39:40] ppocr INFO:     Backbone : 
[2023/05/26 21:39:40] ppocr INFO:         model_name : large
[2023/05/26 21:39:40] ppocr INFO:         name : MobileNetV3
[2023/05/26 21:39:40] ppocr INFO:         scale : 0.5
[2023/05/26 21:39:40] ppocr INFO:     Head : 
[2023/05/26 21:39:40] ppocr INFO:         fc_decay : 0
[2023/05/26 21:39:40] ppocr INFO:         name : CTCHead
[2023/05/26 21:39:40] ppocr INFO:     Neck : 
[2023/05/26 21:39:40] ppocr INFO:         encoder_type : rnn
[2023/05/26 21:39:40] ppocr INFO:         hidden_size : 96
[2023/05/26 21:39:40] ppocr INFO:         name : SequenceEncoder
[2023/05/26 21:39:40] ppocr INFO:     Transform : None
[2023/05/26 21:39:40] ppocr INFO:     algorithm : CRNN
[2023/05/26 21:39:40] ppocr INFO:     model_type : rec
[2023/05/26 21:39:40] ppocr INFO: Eval : 
[2023/05/26 21:39:40] ppocr INFO:     dataset : 
[2023/05/26 21:39:40] ppocr INFO:         data_dir : /data/XXX/
[2023/05/26 21:39:40] ppocr INFO:         label_file_list : ['/data/XXX/ppocr_test_list_20230526.txt']
[2023/05/26 21:39:40] ppocr INFO:         name : SimpleDataSet
[2023/05/26 21:39:40] ppocr INFO:         transforms : 
[2023/05/26 21:39:40] ppocr INFO:             DecodeImage : 
[2023/05/26 21:39:40] ppocr INFO:                 channel_first : False
[2023/05/26 21:39:40] ppocr INFO:                 img_mode : BGR
[2023/05/26 21:39:40] ppocr INFO:             CTCLabelEncode : None
[2023/05/26 21:39:40] ppocr INFO:             RecResizeImg : 
[2023/05/26 21:39:40] ppocr INFO:                 image_shape : [3, 32, 100]
[2023/05/26 21:39:40] ppocr INFO:             KeepKeys : 
[2023/05/26 21:39:40] ppocr INFO:                 keep_keys : ['image', 'label', 'length']
[2023/05/26 21:39:40] ppocr INFO:     loader : 
[2023/05/26 21:39:40] ppocr INFO:         batch_size_per_card : 640
[2023/05/26 21:39:40] ppocr INFO:         drop_last : False
[2023/05/26 21:39:40] ppocr INFO:         num_workers : 0
[2023/05/26 21:39:40] ppocr INFO:         shuffle : False
[2023/05/26 21:39:40] ppocr INFO:         use_shared_memory : False
[2023/05/26 21:39:40] ppocr INFO: Global : 
[2023/05/26 21:39:40] ppocr INFO:     cal_metric_during_train : True
[2023/05/26 21:39:40] ppocr INFO:     character_dict_path : ppocr/utils/plate_dict_all_95.txt
[2023/05/26 21:39:40] ppocr INFO:     checkpoints : None
[2023/05/26 21:39:40] ppocr INFO:     distributed : True
[2023/05/26 21:39:40] ppocr INFO:     epoch_num : 2000
[2023/05/26 21:39:40] ppocr INFO:     eval_batch_step : [0, 90]
[2023/05/26 21:39:40] ppocr INFO:     infer_img : doc/imgs_words_en/word_10.png
[2023/05/26 21:39:40] ppocr INFO:     infer_mode : False
[2023/05/26 21:39:40] ppocr INFO:     log_smooth_window : 20
[2023/05/26 21:39:40] ppocr INFO:     max_text_length : 8
[2023/05/26 21:39:40] ppocr INFO:     pretrained_model : ./output/rec/crnn_mobile_20230515/best_accuracy.pdparams
[2023/05/26 21:39:40] ppocr INFO:     print_batch_step : 3
[2023/05/26 21:39:40] ppocr INFO:     save_epoch_step : 30
[2023/05/26 21:39:40] ppocr INFO:     save_inference_dir : ./output/rec/crnn_mobile_plate_20230526
[2023/05/26 21:39:40] ppocr INFO:     save_model_dir : ./output/rec/crnn_mobile_plate_20230526
[2023/05/26 21:39:40] ppocr INFO:     save_res_path : ./output/rec/predicts_ic15.txt
[2023/05/26 21:39:40] ppocr INFO:     use_gpu : True
[2023/05/26 21:39:40] ppocr INFO:     use_space_char : False
[2023/05/26 21:39:40] ppocr INFO:     use_visualdl : True
[2023/05/26 21:39:40] ppocr INFO: Loss : 
[2023/05/26 21:39:40] ppocr INFO:     name : CTCLoss
[2023/05/26 21:39:40] ppocr INFO: Metric : 
[2023/05/26 21:39:40] ppocr INFO:     main_indicator : acc
[2023/05/26 21:39:40] ppocr INFO:     name : RecMetric
[2023/05/26 21:39:40] ppocr INFO: Optimizer : 
[2023/05/26 21:39:40] ppocr INFO:     beta1 : 0.9
[2023/05/26 21:39:40] ppocr INFO:     beta2 : 0.999
[2023/05/26 21:39:40] ppocr INFO:     lr : 
[2023/05/26 21:39:40] ppocr INFO:         learning_rate : 5e-05
[2023/05/26 21:39:40] ppocr INFO:         name : Cosine
[2023/05/26 21:39:40] ppocr INFO:         warmup_epoch : 2
[2023/05/26 21:39:40] ppocr INFO:     name : Adam
[2023/05/26 21:39:40] ppocr INFO:     regularizer : 
[2023/05/26 21:39:40] ppocr INFO:         factor : 3e-05
[2023/05/26 21:39:40] ppocr INFO:         name : L2
[2023/05/26 21:39:40] ppocr INFO: PostProcess : 
[2023/05/26 21:39:40] ppocr INFO:     name : CTCLabelDecode
[2023/05/26 21:39:40] ppocr INFO: Train : 
[2023/05/26 21:39:40] ppocr INFO:     dataset : 
[2023/05/26 21:39:40] ppocr INFO:         data_dir : /data/XXX
[2023/05/26 21:39:40] ppocr INFO:         label_file_list : ['/data/XXX/ppocr_train_list_20230526.txt']
[2023/05/26 21:39:40] ppocr INFO:         name : SimpleDataSet
[2023/05/26 21:39:40] ppocr INFO:         transforms : 
[2023/05/26 21:39:40] ppocr INFO:             DecodeImage : 
[2023/05/26 21:39:40] ppocr INFO:                 channel_first : False
[2023/05/26 21:39:40] ppocr INFO:                 img_mode : BGR
[2023/05/26 21:39:40] ppocr INFO:             RecAug : None
[2023/05/26 21:39:40] ppocr INFO:             CTCLabelEncode : None
[2023/05/26 21:39:40] ppocr INFO:             RecResizeImg : 
[2023/05/26 21:39:40] ppocr INFO:                 image_shape : [3, 32, 100]
[2023/05/26 21:39:40] ppocr INFO:             KeepKeys : 
[2023/05/26 21:39:40] ppocr INFO:                 keep_keys : ['image', 'label', 'length']
[2023/05/26 21:39:40] ppocr INFO:     loader : 
[2023/05/26 21:39:40] ppocr INFO:         batch_size_per_card : 1560
[2023/05/26 21:39:40] ppocr INFO:         drop_last : True
[2023/05/26 21:39:40] ppocr INFO:         num_workers : 6
[2023/05/26 21:39:40] ppocr INFO:         shuffle : True
[2023/05/26 21:39:40] ppocr INFO:         use_shared_memory : False
[2023/05/26 21:39:40] ppocr INFO: profiler_options : None
[2023/05/26 21:39:40] ppocr INFO: train with paddle 2.4.2 and device Place(gpu:0)
I0526 21:39:40.963696 215459 tcp_utils.cc:181] The server starts to listen on IP_ANY:40741
I0526 21:39:40.963920 215459 tcp_utils.cc:130] Successfully connected to 10.30.64.16:40741
W0526 21:39:43.979482 215459 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.2, Runtime API Version: 10.2
W0526 21:39:43.985281 215459 gpu_resources.cc:91] device: 0, cuDNN Version: 7.6.
[2023/05/26 21:39:44] ppocr INFO: Initialize indexs of datasets:['/data/XXX/ppocr_train_list_20230526.txt']
[2023/05/26 21:39:49] ppocr INFO: Initialize indexs of datasets:['/data/XXX/ppocr_test_list_20230526.txt']
[2023/05/26 21:39:49] ppocr INFO: train dataloader has 325 iters
[2023/05/26 21:39:49] ppocr INFO: valid dataloader has 33 iters
[2023/05/26 21:39:49] ppocr INFO: load pretrain successful from ./output/rec/crnn_mobile_20230515/best_accuracy
[2023/05/26 21:39:49] ppocr INFO: During the training process, after the 0th iteration, an evaluation is run every 90 iterations
[2023/05/26 21:40:18] ppocr INFO: epoch: [1/2000], global_step: 3, lr: 0.000000, acc: 0.908974, norm_edit_dis: 0.978171, loss: 0.542156, avg_reader_cost: 7.57234 s, avg_batch_cost: 9.68602 s, avg_samples: 1560.0, ips: 161.05688 samples/s, eta: 72 days, 20:51:23
[2023/05/26 21:40:21] ppocr INFO: epoch: [1/2000], global_step: 6, lr: 0.000000, acc: 0.908974, norm_edit_dis: 0.978182, loss: 0.524130, avg_reader_cost: 0.00739 s, avg_batch_cost: 0.80739 s, avg_samples: 1560.0, ips: 1932.16094 samples/s, eta: 39 days, 11:18:45
[2023/05/26 21:40:35] ppocr INFO: epoch: [1/2000], global_step: 9, lr: 0.000000, acc: 0.908974, norm_edit_dis: 0.980369, loss: 0.506103, avg_reader_cost: 3.16675 s, avg_batch_cost: 4.87065 s, avg_samples: 1560.0, ips: 320.28554 samples/s, eta: 38 days, 12:40:33
[2023/05/26 21:40:38] ppocr INFO: epoch: [1/2000], global_step: 12, lr: 0.000000, acc: 0.908974, norm_edit_dis: 0.979281, loss: 0.509184, avg_reader_cost: 0.00719 s, avg_batch_cost: 0.82108 s, avg_samples: 1560.0, ips: 1899.93380 samples/s, eta: 30 days, 10:33:56
[2023/05/26 21:40:53] ppocr INFO: epoch: [1/2000], global_step: 15, lr: 0.000001, acc: 0.908974, norm_edit_dis: 0.978194, loss: 0.512266, avg_reader_cost: 3.24697 s, avg_batch_cost: 5.13098 s, avg_samples: 1560.0, ips: 304.03574 samples/s, eta: 32 days, 1:43:51
[2023/05/26 21:40:55] ppocr INFO: epoch: [1/2000], global_step: 18, lr: 0.000001, acc: 0.908974, norm_edit_dis: 0.978520, loss: 0.509184, avg_reader_cost: 0.00864 s, avg_batch_cost: 0.78517 s, avg_samples: 1560.0, ips: 1986.82646 samples/s, eta: 27 days, 17:03:59
[2023/05/26 21:41:12] ppocr INFO: epoch: [1/2000], global_step: 21, lr: 0.000001, acc: 0.908974, norm_edit_dis: 0.978520, loss: 0.509184, avg_reader_cost: 3.72027 s, avg_batch_cost: 5.48009 s, avg_samples: 1560.0, ips: 284.66691 samples/s, eta: 29 days, 15:24:05
[2023/05/26 21:41:14] ppocr INFO: epoch: [1/2000], global_step: 24, lr: 0.000001, acc: 0.909615, norm_edit_dis: 0.979905, loss: 0.501763, avg_reader_cost: 0.01070 s, avg_batch_cost: 0.81559 s, avg_samples: 1560.0, ips: 1912.72225 samples/s, eta: 26 days, 16:52:48

[2023/05/26 21:44:42] ppocr INFO: epoch: [1/2000], global_step: 90, lr: 0.000006, acc: 0.910577, norm_edit_dis: 0.979012, loss: 0.501043, avg_reader_cost: 0.00964 s, avg_batch_cost: 0.89512 s, avg_samples: 1560.0, ips: 1742.77987 samples/s, eta: 24 days, 10:18:56
[2023/05/26 21:44:55] ppocr INFO: cur metric, acc: 0.9651959833147346, norm_edit_dis: 0.9900051375106863, fps: 10678.829221841841
[2023/05/26 21:44:56] ppocr INFO: save best model is to ./output/rec/crnn_mobile_plate_20230524/best_accuracy
[2023/05/26 21:44:56] ppocr INFO: best metric, acc: 0.9651959833147346, is_float16: False, norm_edit_dis: 0.9900051375106863, fps: 10678.829221841841, best_epoch: 1
[2023/05/26 21:44:58] ppocr INFO: epoch: [1/2000], global_step: 93, lr: 0.000006, acc: 0.910577, norm_edit_dis: 0.978795, loss: 0.502586, avg_reader_cost: 0.01299 s, avg_batch_cost: 0.93993 s, avg_samples: 1560.0, ips: 1659.70487 samples/s, eta: 23 days, 20:52:24
[2023/05/26 21:45:01] ppocr INFO: epoch: [1/2000], global_step: 96, lr: 0.000007, acc: 0.911218, norm_edit_dis: 0.978652, loss: 0.511553, avg_reader_cost: 0.01121 s, avg_batch_cost: 0.88406 s, avg_samples: 1560.0, ips: 1764.59392 samples/s, eta: 23 days, 7:57:21
[2023/05/26 21:45:18] ppocr INFO: epoch: [1/2000], global_step: 99, lr: 0.000007, acc: 0.911218, norm_edit_dis: 0.979224, loss: 0.502586, avg_reader_cost: 2.72876 s, avg_batch_cost: 5.55331 s, avg_samples: 1560.0, ips: 280.91336 samples/s, eta: 23 days, 21:21:52
[2023/05/26 21:45:20] ppocr INFO: epoch: [1/2000], global_step: 102, lr: 0.000007, acc: 0.911218, norm_edit_dis: 0.979218, loss: 0.496262, avg_reader_cost: 0.00898 s, avg_batch_cost: 0.83183 s, avg_samples: 1560.0, ips: 1875.38152 samples/s, eta: 23 days, 8:54:54
[2023/05/26 21:45:37] ppocr INFO: epoch: [1/2000], global_step: 105, lr: 0.000007, acc: 0.913462, norm_edit_dis: 0.980197, loss: 0.485809, avg_reader_cost: 2.42933 s, avg_batch_cost: 5.57986 s, avg_samples: 1560.0, ips: 279.57671 samples/s, eta: 23 days, 21:40:00
[2023/05/26 21:45:40] ppocr INFO: epoch: [1/2000], global_step: 108, lr: 0.000007, acc: 0.913462, norm_edit_dis: 0.980197, loss: 0.485809, avg_reader_cost: 0.00867 s, avg_batch_cost: 0.89077 s, avg_samples: 1560.0, ips: 1751.29344 samples/s, eta: 23 days, 10:11:45
[2023/05/26 21:45:56] ppocr INFO: epoch: [1/2000], global_step: 111, lr: 0.000008, acc: 0.912821, norm_edit_dis: 0.979699, loss: 0.494489, avg_reader_cost: 2.05312 s, avg_batch_cost: 5.39939 s, avg_samples: 1560.0, ips: 288.92134 samples/s, eta: 23 days, 21:20:34
[2023/05/26 21:45:59] ppocr INFO: epoch: [1/2000], global_step: 114, lr: 0.000008, acc: 0.912821, norm_edit_dis: 0.979699, loss: 0.494489, avg_reader_cost: 0.00861 s, avg_batch_cost: 0.92145 s, avg_samples: 1560.0, ips: 1692.98757 samples/s, eta: 23 days, 10:37:47
[2023/05/26 21:46:15] ppocr INFO: epoch: [1/2000], global_step: 117, lr: 0.000008, acc: 0.913462, norm_edit_dis: 0.979802, loss: 0.489028, avg_reader_cost: 2.30208 s, avg_batch_cost: 5.31673 s, avg_samples: 1560.0, ips: 293.41319 samples/s, eta: 23 days, 20:48:39
[2023/05/26 21:46:18] ppocr INFO: epoch: [1/2000], global_step: 120, lr: 0.000008, acc: 0.913462, norm_edit_dis: 0.979756, loss: 0.492457, avg_reader_cost: 0.01004 s, avg_batch_cost: 0.97154 s, avg_samples: 1560.0, ips: 1605.70097 samples/s, eta: 23 days, 10:52:21
[2023/05/26 21:46:32] ppocr INFO: epoch: [1/2000], global_step: 123, lr: 0.000009, acc: 0.909936, norm_edit_dis: 0.979699, loss: nanxxx, avg_reader_cost: 2.78998 s, avg_batch_cost: 4.88461 s, avg_samples: 1560.0, ips: 319.37054 samples/s, eta: 23 days, 18:38:53
[2023/05/26 21:46:35] ppocr INFO: epoch: [1/2000], global_step: 126, lr: 0.000009, acc: 0.906731, norm_edit_dis: 0.978360, loss: nanxxx, avg_reader_cost: 0.01198 s, avg_batch_cost: 0.94254 s, avg_samples: 1560.0, ips: 1655.09504 samples/s, eta: 23 days, 9:06:35
[2023/05/26 21:46:51] ppocr INFO: epoch: [1/2000], global_step: 129, lr: 0.000009, acc: 0.901282, norm_edit_dis: 0.977055, loss: nanxxx, avg_reader_cost: 3.55028 s, avg_batch_cost: 5.28322 s, avg_samples: 1560.0, ips: 295.27432 samples/s, eta: 23 days, 18:14:17
[2023/05/26 21:46:53] ppocr INFO: epoch: [1/2000], global_step: 132, lr: 0.000009, acc: 0.448718, norm_edit_dis: 0.487403, loss: nanxxx, avg_reader_cost: 0.01077 s, avg_batch_cost: 0.81114 s, avg_samples: 1560.0, ips: 1923.22562 samples/s, eta: 23 days, 8:36:12
[2023/05/26 21:47:09] ppocr INFO: epoch: [1/2000], global_step: 135, lr: 0.000010, acc: 0.000000, norm_edit_dis: 0.000000, loss: nanxxx, avg_reader_cost: 3.80528 s, avg_batch_cost: 5.22565 s, avg_samples: 1560.0, ips: 298.52758 samples/s, eta: 23 days, 17:06:20
[2023/05/26 21:47:12] ppocr INFO: epoch: [1/2000], global_step: 138, lr: 0.000010, acc: 0.000000, norm_edit_dis: 0.000000, loss: nanxxx, avg_reader_cost: 0.01222 s, avg_batch_cost: 0.93468 s, avg_samples: 1560.0, ips: 1669.02413 samples/s, eta: 23 days, 8:23:57
[2023/05/26 21:47:28] ppocr INFO: epoch: [1/2000], global_step: 141, lr: 0.000010, acc: 0.000000, norm_edit_dis: 0.000000, loss: nanxxx, avg_reader_cost: 4.10710 s, avg_batch_cost: 5.23770 s, avg_samples: 1560.0, ips: 297.84040 samples/s, eta: 23 days, 16:35:24
[2023/05/26 21:47:30] ppocr INFO: epoch: [1/2000], global_step: 144, lr: 0.000010, acc: 0.000000, norm_edit_dis: 0.000000, loss: nanxxx, avg_reader_cost: 0.01120 s, avg_batch_cost: 0.90054 s, avg_samples: 1560.0, ips: 1732.28915 samples/s, eta: 23 days, 8:07:43
[2023/05/26 21:47:46] ppocr INFO: epoch: [1/2000], global_step: 147, lr: 0.000011, acc: 0.000000, norm_edit_dis: 0.000000, loss: nanxxx, avg_reader_cost: 3.93576 s, avg_batch_cost: 5.14470 s, avg_samples: 1560.0, ips: 303.22484 samples/s, eta: 23 days, 15:38:52
[2023/05/26 21:47:48] ppocr INFO: epoch: [1/2000], global_step: 150, lr: 0.000011, acc: 0.000000, norm_edit_dis: 0.000000, loss: nanxxx, avg_reader_cost: 0.01255 s, avg_batch_cost: 0.77947 s, avg_samples: 1560.0, ips: 2001.35651 samples/s, eta: 23 days, 7:06:23
[2023/05/26 21:48:07] ppocr INFO: epoch: [1/2000], global_step: 153, lr: 0.000011, acc: 0.000000, norm_edit_dis: 0.000000, loss: nanxxx, avg_reader_cost: 4.48152 s, avg_batch_cost: 6.15389 s, avg_samples: 1560.0, ips: 253.49818 samples/s, eta: 23 days, 17:55:21
[2023/05/26 21:48:09] ppocr INFO: epoch: [1/2000], global_step: 156, lr: 0.000011, acc: 0.000000, norm_edit_dis: 0.000000, loss: nanxxx, avg_reader_cost: 0.01284 s, avg_batch_cost: 0.85864 s, avg_samples: 1560.0, ips: 1816.82183 samples/s, eta: 23 days, 9:56:26
[2023/05/26 21:48:25] ppocr INFO: epoch: [1/2000], global_step: 159, lr: 0.000011, acc: 0.000000, norm_edit_dis: 0.000000, loss: nanxxx, avg_reader_cost: 3.85689 s, avg_batch_cost: 5.43838 s, avg_samples: 1560.0, ips: 286.85031 samples/s, eta: 23 days, 17:51:28
[2023/05/26 21:48:28] ppocr INFO: epoch: [1/2000], global_step: 162, lr: 0.000012, acc: 0.000000, norm_edit_dis: 0.000000, loss: nanxxx, avg_reader_cost: 0.01038 s, avg_batch_cost: 0.89511 s, avg_samples: 1560.0, ips: 1742.80293 samples/s, eta: 23 days, 10:17:40
[2023/05/26 21:48:44] ppocr INFO: epoch: [1/2000], global_step: 165, lr: 0.000012, acc: 0.000000, norm_edit_dis: 0.000000, loss: nanxxx, avg_reader_cost: 3.90533 s, avg_batch_cost: 5.40572 s, avg_samples: 1560.0, ips: 288.58294 samples/s, eta: 23 days, 17:48:36
[2023/05/26 21:48:47] ppocr INFO: epoch: [1/2000], global_step: 168, lr: 0.000012, acc: 0.000000, norm_edit_dis: 0.000000, loss: nanxxx, avg_reader_cost: 0.00899 s, avg_batch_cost: 0.87783 s, avg_samples: 1560.0, ips: 1777.11182 samples/s, eta: 23 days, 10:27:42
[2023/05/26 21:49:02] ppocr INFO: epoch: [1/2000], global_step: 171, lr: 0.000012, acc: 0.000000, norm_edit_dis: 0.000000, loss: nanxxx, avg_reader_cost: 3.85482 s, avg_batch_cost: 5.00849 s, avg_samples: 1560.0, ips: 311.47126 samples/s, eta: 23 days, 16:27:08
[2023/05/26 21:49:05] ppocr INFO: epoch: [1/2000], global_step: 174, lr: 0.000013, acc: 0.000000, norm_edit_dis: 0.000000, loss: nanxxx, avg_reader_cost: 0.01411 s, avg_batch_cost: 0.91575 s, avg_samples: 1560.0, ips: 1703.52077 samples/s, eta: 23 days, 9:29:56
[2023/05/26 21:49:21] ppocr INFO: epoch: [1/2000], global_step: 177, lr: 0.000013, acc: 0.000000, norm_edit_dis: 0.000000, loss: nanxxx, avg_reader_cost: 4.27214 s, avg_batch_cost: 5.36100 s, avg_samples: 1560.0, ips: 290.99056 samples/s, eta: 23 days, 16:22:51
[2023/05/26 21:49:23] ppocr INFO: epoch: [1/2000], global_step: 180, lr: 0.000013, acc: 0.000000, norm_edit_dis: 0.000000, loss: nanxxx, avg_reader_cost: 0.00861 s, avg_batch_cost: 0.85302 s, avg_samples: 1560.0, ips: 1828.80087 samples/s, eta: 23 days, 9:28:18
[2023/05/26 21:49:37] ppocr INFO: cur metric, acc: 0.0, norm_edit_dis: 4.827186428357777e-10, fps: 12132.609800708638
[2023/05/26 21:49:37] ppocr INFO: best metric, acc: 0.9651959833147346, is_float16: False, norm_edit_dis: 0.9900051375106863, fps: 10678.829221841841, best_epoch: 1
[2023/05/26 21:49:42] ppocr INFO: epoch: [1/2000], global_step: 183, lr: 0.000013, acc: 0.000000, norm_edit_dis: 0.000000, loss: nanxxx, avg_reader_cost: 0.62102 s, avg_batch_cost: 1.74868 s, avg_samples: 1560.0, ips: 892.10003 samples/s, eta: 23 days, 5:26:21
justcodew commented 1 year ago

image

justcodew commented 1 year ago

排查了几个地方: 1.学习率,当前学习率为1e-04,比较小 2.训练数据和验证数据,抽查了一部分数据和标注,没有问题

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

idontkonwher commented 11 months ago

请问解决了吗?