argman / EAST

A tensorflow implementation of EAST text detector
GNU General Public License v3.0
3.02k stars 1.05k forks source link

Training loss oscillate between 0.01 and 0.05 #246

Open clw5180 opened 5 years ago

clw5180 commented 5 years ago

I have about 60000 Calligraphy images(vertical direction), and I use one RTX 2080 to train. Here is my training parameters:

python multigpu_train.py --gpu_list=0 --input_size=1024 --batch_size_per_gpu=2 --checkpoint_path=/home/xxx/EAST/ckpt --text_scale=1024 --training_data_path=/media/root/data/data_xxxxxx/train_val_dataset --geometry=RBOX --learning_rate=0.0001 --num_readers=2 --max_steps=100000 

The results are as follows. It seems that the loss is oscillated, and I don't sure whether it's right. Maybe the batch_size is too small?( If I set input_size=1024 and batch_size=4, then OOM comes...) Can any big brother come and help? Thanks a lot.

Step 000000, model loss 0.2513, total loss 0.5218, 1.93 seconds/step, 1.04 examples/second
Step 000010, model loss 0.0890, total loss 0.3594, 4.20 seconds/step, 0.48 examples/second
Step 000020, model loss 0.0305, total loss 0.3007, 4.23 seconds/step, 0.47 examples/second
Step 000030, model loss 0.0303, total loss 0.3003, 5.37 seconds/step, 0.37 examples/second
Step 000040, model loss 0.0100, total loss 0.2798, 8.50 seconds/step, 0.24 examples/second
Step 000050, model loss 0.0549, total loss 0.3245, 3.93 seconds/step, 0.51 examples/second
Step 000060, model loss 0.0132, total loss 0.2824, 5.59 seconds/step, 0.36 examples/second
Step 000070, model loss 0.0194, total loss 0.2883, 5.31 seconds/step, 0.38 examples/second
Step 000080, model loss 0.0233, total loss 0.2920, 3.84 seconds/step, 0.52 examples/second
Step 000090, model loss 0.0167, total loss 0.2850, 3.99 seconds/step, 0.50 examples/second
Step 000100, model loss 0.0372, total loss 0.3052, 4.85 seconds/step, 0.41 examples/second
Step 000110, model loss 0.0278, total loss 0.2954, 4.56 seconds/step, 0.44 examples/second
Step 000120, model loss 0.0297, total loss 0.2969, 6.39 seconds/step, 0.31 examples/second
Step 000130, model loss 0.0249, total loss 0.2918, 4.11 seconds/step, 0.49 examples/second
Step 000140, model loss 0.0508, total loss 0.3173, 4.79 seconds/step, 0.42 examples/second
Step 000150, model loss 0.0231, total loss 0.2892, 4.91 seconds/step, 0.41 examples/second
Step 000160, model loss 0.0254, total loss 0.2911, 5.34 seconds/step, 0.37 examples/second
Step 000170, model loss 0.0274, total loss 0.2927, 2.93 seconds/step, 0.68 examples/second
Step 000180, model loss 0.0153, total loss 0.2802, 4.40 seconds/step, 0.45 examples/second
Step 000190, model loss 0.0442, total loss 0.3087, 4.49 seconds/step, 0.45 examples/second
Step 000200, model loss 0.0189, total loss 0.2829, 5.34 seconds/step, 0.37 examples/second
Step 000210, model loss 0.0421, total loss 0.3057, 3.44 seconds/step, 0.58 examples/second
Step 000220, model loss 0.0351, total loss 0.2982, 5.45 seconds/step, 0.37 examples/second
Step 000230, model loss 0.0429, total loss 0.3056, 3.91 seconds/step, 0.51 examples/second
Step 000240, model loss 0.0270, total loss 0.2892, 4.90 seconds/step, 0.41 examples/second
Step 000250, model loss 0.0132, total loss 0.2749, 2.82 seconds/step, 0.71 examples/second
Step 000260, model loss 0.0839, total loss 0.3452, 3.76 seconds/step, 0.53 examples/second
Step 000270, model loss 0.0269, total loss 0.2877, 3.56 seconds/step, 0.56 examples/second
Step 000280, model loss 0.0194, total loss 0.2797, 4.68 seconds/step, 0.43 examples/second
Step 000290, model loss 0.0188, total loss 0.2786, 3.15 seconds/step, 0.63 examples/second
Step 000300, model loss 0.0343, total loss 0.2936, 6.75 seconds/step, 0.30 examples/second
Step 000310, model loss 0.0678, total loss 0.3265, 8.02 seconds/step, 0.25 examples/second
Step 000320, model loss 0.0220, total loss 0.2803, 2.60 seconds/step, 0.77 examples/second
Step 000330, model loss 0.0164, total loss 0.2741, 7.86 seconds/step, 0.25 examples/second
Step 000340, model loss 0.0150, total loss 0.2722, 4.42 seconds/step, 0.45 examples/second
Step 000350, model loss 0.0131, total loss 0.2698, 3.38 seconds/step, 0.59 examples/second
Step 000360, model loss 0.0180, total loss 0.2742, 4.44 seconds/step, 0.45 examples/second
Step 000370, model loss 0.0415, total loss 0.2971, 5.26 seconds/step, 0.38 examples/second
Step 000380, model loss 0.0171, total loss 0.2722, 5.36 seconds/step, 0.37 examples/second
Step 000390, model loss 0.0150, total loss 0.2695, 4.02 seconds/step, 0.50 examples/second
Step 000400, model loss 0.0327, total loss 0.2867, 6.68 seconds/step, 0.30 examples/second
Step 000410, model loss 0.0124, total loss 0.2658, 3.37 seconds/step, 0.59 examples/second
Step 000420, model loss 0.0428, total loss 0.2958, 6.14 seconds/step, 0.33 examples/second
Step 000430, model loss 0.0160, total loss 0.2684, 4.35 seconds/step, 0.46 examples/second
Step 000440, model loss 0.0342, total loss 0.2860, 3.45 seconds/step, 0.58 examples/second
Step 000450, model loss 0.0141, total loss 0.2654, 2.50 seconds/step, 0.80 examples/second
Step 000460, model loss 0.0177, total loss 0.2685, 4.95 seconds/step, 0.40 examples/second
Step 000470, model loss 0.0148, total loss 0.2650, 4.13 seconds/step, 0.48 examples/second
Step 000480, model loss 0.0158, total loss 0.2654, 3.28 seconds/step, 0.61 examples/second
Step 000490, model loss 0.0319, total loss 0.2810, 5.98 seconds/step, 0.33 examples/second
Step 000500, model loss 0.0654, total loss 0.3139, 7.30 seconds/step, 0.27 examples/second
Step 000510, model loss 0.0188, total loss 0.2668, 4.38 seconds/step, 0.46 examples/second
Step 000520, model loss 0.0426, total loss 0.2900, 4.23 seconds/step, 0.47 examples/second
Step 000530, model loss 0.0381, total loss 0.2850, 3.76 seconds/step, 0.53 examples/second
Step 000540, model loss 0.0111, total loss 0.2574, 3.71 seconds/step, 0.54 examples/second
Step 000550, model loss 0.0923, total loss 0.3380, 5.59 seconds/step, 0.36 examples/second
Step 000560, model loss 0.0121, total loss 0.2573, 3.89 seconds/step, 0.51 examples/second
Step 000570, model loss 0.0279, total loss 0.2725, 3.11 seconds/step, 0.64 examples/second
Step 000580, model loss 0.0121, total loss 0.2562, 3.27 seconds/step, 0.61 examples/second
Step 000590, model loss 0.0236, total loss 0.2672, 9.31 seconds/step, 0.21 examples/second
Step 000600, model loss 0.0333, total loss 0.2764, 4.07 seconds/step, 0.49 examples/second
Step 000610, model loss 0.0157, total loss 0.2582, 4.99 seconds/step, 0.40 examples/second
Step 000620, model loss 0.0665, total loss 0.3084, 6.99 seconds/step, 0.29 examples/second
Step 000630, model loss 0.0768, total loss 0.3182, 11.28 seconds/step, 0.18 examples/second
Step 000640, model loss 0.0187, total loss 0.2595, 4.57 seconds/step, 0.44 examples/second
Step 000650, model loss 0.0193, total loss 0.2596, 5.60 seconds/step, 0.36 examples/second
Step 000660, model loss 0.0392, total loss 0.2790, 5.75 seconds/step, 0.35 examples/second
Step 000670, model loss 0.0152, total loss 0.2544, 5.72 seconds/step, 0.35 examples/second
Step 000680, model loss 0.0433, total loss 0.2820, 4.75 seconds/step, 0.42 examples/second
Step 000690, model loss 0.0284, total loss 0.2665, 1.77 seconds/step, 1.13 examples/second
Step 000700, model loss 0.0254, total loss 0.2629, 2.69 seconds/step, 0.74 examples/second
Step 000710, model loss 0.0620, total loss 0.2989, 4.28 seconds/step, 0.47 examples/second
Step 000720, model loss 0.2526, total loss 0.4890, 7.24 seconds/step, 0.28 examples/second
Step 000730, model loss 0.0117, total loss 0.2475, 2.08 seconds/step, 0.96 examples/second
Step 000740, model loss 0.0199, total loss 0.2552, 6.59 seconds/step, 0.30 examples/second
Step 000750, model loss 0.0175, total loss 0.2522, 4.16 seconds/step, 0.48 examples/second
Step 000760, model loss 0.0502, total loss 0.2843, 6.23 seconds/step, 0.32 examples/second
Step 000770, model loss 0.0617, total loss 0.2953, 6.55 seconds/step, 0.31 examples/second
Step 000780, model loss 0.0181, total loss 0.2511, 4.05 seconds/step, 0.49 examples/second
Step 000790, model loss 0.0119, total loss 0.2444, 5.41 seconds/step, 0.37 examples/second
Step 000800, model loss 0.0340, total loss 0.2659, 8.97 seconds/step, 0.22 examples/second
Step 000810, model loss 0.0194, total loss 0.2507, 6.42 seconds/step, 0.31 examples/second
Step 000820, model loss 0.0208, total loss 0.2515, 4.81 seconds/step, 0.42 examples/second
Step 000830, model loss 0.0469, total loss 0.2770, 3.93 seconds/step, 0.51 examples/second
Step 000840, model loss 0.0108, total loss 0.2404, 6.37 seconds/step, 0.31 examples/second
Step 000850, model loss 0.0151, total loss 0.2441, 6.04 seconds/step, 0.33 examples/second
Step 000860, model loss 0.0439, total loss 0.2724, 4.90 seconds/step, 0.41 examples/second
Step 000870, model loss 0.0290, total loss 0.2569, 2.68 seconds/step, 0.75 examples/second
Step 000880, model loss 0.0323, total loss 0.2596, 4.29 seconds/step, 0.47 examples/second
Step 000890, model loss 0.0120, total loss 0.2388, 3.79 seconds/step, 0.53 examples/second
Step 000900, model loss 0.0330, total loss 0.2592, 5.15 seconds/step, 0.39 examples/second
Step 000910, model loss 0.0324, total loss 0.2580, 3.79 seconds/step, 0.53 examples/second
Step 000920, model loss 0.0386, total loss 0.2636, 3.88 seconds/step, 0.52 examples/second
Step 000930, model loss 0.0464, total loss 0.2708, 5.36 seconds/step, 0.37 examples/second
Step 000940, model loss 0.0196, total loss 0.2434, 4.26 seconds/step, 0.47 examples/second
Step 000950, model loss 0.0144, total loss 0.2376, 5.13 seconds/step, 0.39 examples/second
Step 000960, model loss 0.0308, total loss 0.2535, 3.90 seconds/step, 0.51 examples/second
Step 000970, model loss 0.0253, total loss 0.2474, 6.60 seconds/step, 0.30 examples/second
Step 000980, model loss 0.0279, total loss 0.2494, 6.65 seconds/step, 0.30 examples/second
Step 000990, model loss 0.0255, total loss 0.2465, 3.57 seconds/step, 0.56 examples/second
Step 001000, model loss 0.0448, total loss 0.2652, 3.35 seconds/step, 0.60 examples/second
Step 001010, model loss 0.0406, total loss 0.2604, 5.24 seconds/step, 0.38 examples/second
Step 001020, model loss 0.0288, total loss 0.2481, 2.64 seconds/step, 0.76 examples/second
Step 001030, model loss 0.0454, total loss 0.2641, 5.32 seconds/step, 0.38 examples/second
Step 001040, model loss 0.0198, total loss 0.2380, 4.43 seconds/step, 0.45 examples/second
Step 001050, model loss 0.0182, total loss 0.2359, 5.88 seconds/step, 0.34 examples/second
Step 001060, model loss 0.0236, total loss 0.2407, 4.14 seconds/step, 0.48 examples/second
Step 001070, model loss 0.0206, total loss 0.2372, 5.68 seconds/step, 0.35 examples/second
Step 001080, model loss 0.0350, total loss 0.2510, 6.27 seconds/step, 0.32 examples/second
Step 001090, model loss 0.0127, total loss 0.2282, 2.08 seconds/step, 0.96 examples/second
Step 001100, model loss 0.0166, total loss 0.2316, 3.77 seconds/step, 0.53 examples/second
Step 001110, model loss 0.0568, total loss 0.2712, 5.11 seconds/step, 0.39 examples/second
Step 001120, model loss 0.0190, total loss 0.2329, 3.64 seconds/step, 0.55 examples/second
Step 001130, model loss 0.0272, total loss 0.2406, 4.20 seconds/step, 0.48 examples/second
Step 001140, model loss 0.0216, total loss 0.2345, 5.11 seconds/step, 0.39 examples/second
Step 001150, model loss 0.0188, total loss 0.2311, 4.98 seconds/step, 0.40 examples/second
Step 001160, model loss 0.0155, total loss 0.2273, 4.15 seconds/step, 0.48 examples/second
Step 001170, model loss 0.0501, total loss 0.2613, 4.28 seconds/step, 0.47 examples/second
Step 001180, model loss 0.0300, total loss 0.2407, 5.77 seconds/step, 0.35 examples/second
Step 001190, model loss 0.0121, total loss 0.2223, 3.06 seconds/step, 0.65 examples/second
Step 001200, model loss 0.0312, total loss 0.2408, 4.75 seconds/step, 0.42 examples/second
Step 001210, model loss 0.0258, total loss 0.2348, 7.99 seconds/step, 0.25 examples/second
Step 001220, model loss 0.0328, total loss 0.2413, 3.14 seconds/step, 0.64 examples/second
Step 001230, model loss 0.0531, total loss 0.2610, 7.04 seconds/step, 0.28 examples/second
Step 001240, model loss 0.0127, total loss 0.2202, 4.43 seconds/step, 0.45 examples/second
Step 001250, model loss 0.0365, total loss 0.2434, 5.31 seconds/step, 0.38 examples/second
Step 001260, model loss 0.0426, total loss 0.2490, 6.46 seconds/step, 0.31 examples/second
Step 001270, model loss 0.0510, total loss 0.2568, 5.83 seconds/step, 0.34 examples/second
Step 001280, model loss 0.0188, total loss 0.2242, 5.29 seconds/step, 0.38 examples/second
Step 001290, model loss 0.0275, total loss 0.2323, 3.73 seconds/step, 0.54 examples/second
Step 001300, model loss 0.0156, total loss 0.2200, 3.52 seconds/step, 0.57 examples/second
Step 001310, model loss 0.0110, total loss 0.2147, 3.20 seconds/step, 0.62 examples/second
Step 001320, model loss 0.0262, total loss 0.2294, 4.24 seconds/step, 0.47 examples/second
Step 001330, model loss 0.0117, total loss 0.2144, 4.81 seconds/step, 0.42 examples/second
Step 001340, model loss 0.0116, total loss 0.2137, 1.65 seconds/step, 1.21 examples/second
Step 001350, model loss 0.0448, total loss 0.2464, 4.97 seconds/step, 0.40 examples/second
Step 001360, model loss 0.0333, total loss 0.2345, 4.27 seconds/step, 0.47 examples/second
Step 001370, model loss 0.0113, total loss 0.2119, 3.83 seconds/step, 0.52 examples/second
Step 001380, model loss 0.0729, total loss 0.2731, 6.94 seconds/step, 0.29 examples/second
Step 001390, model loss 0.0130, total loss 0.2126, 4.39 seconds/step, 0.46 examples/second
Step 001400, model loss 0.0132, total loss 0.2123, 5.86 seconds/step, 0.34 examples/second
Step 001410, model loss 0.0494, total loss 0.2479, 3.78 seconds/step, 0.53 examples/second
Step 001420, model loss 0.0187, total loss 0.2167, 5.89 seconds/step, 0.34 examples/second
xxlxx1 commented 5 years ago

maybe in early step, in icdar 2015 data loss in best step is around 91000 step

clw5180 commented 5 years ago

maybe in early step, in icdar 2015 data loss in best step is around 91000 step

Thank you, now I have trained for 1 day, about 18000 epoches, the loss is down to 0.05. And I will give you a surprise:I know you are xxl, and from HITSZ, ITNLP Lab:) It seems that you are also interested in CV, do you join the competition 'Huawei Calligraphy Recognition' of Datafountain?

xxlxx1 commented 5 years ago

@clw5180 I think these information are in my profile.... I haven't joined this competition