Open linwaydong opened 5 years ago
@linwaydong loss 一直不下降这个现象基本没有遇到过,loss的下降通常是从最后一个branch开始的,第一个branch的clc loss 要接近20万次迭代以后才开始迅速下降。
@YonghaoHe CE_gradient /= mx.ndarray.sum(loss_mask).asnumpy()[0] cls loss返回的是所有 loss的平均值,还是总和呢?如果求均值的话,cls loss是不是会很小?
@linwaydong 您好,请问你解决了吗?我发现我在inference的时候,发现在进行cls分类的效果超级差,但是cls损失很小,您弄明白了嘛?方便交流一下嘛?
@mifan0208 损失值大小应该没关系,只要能有一个下降趋势。
请问正常的loss的范围大概是多大。最近在训练head detection 模型,train datasets有61263(pos:5w, neg:1w),但是训练20w loops loss还是感觉很高,不知多少loss为正常。
2020-02-24 10:49:25,561[INFO]: Iter[266870] -- Time elapsed: 18.6 s. Speed: 17.2 images/s.
2020-02-24 10:49:25,561[INFO]: CE_loss_score_0: --> 2095.4179
2020-02-24 10:49:25,561[INFO]: SE_loss_bbox_0: --> 1511.5669
2020-02-24 10:49:25,561[INFO]: CE_loss_score_1: --> 851.2706
2020-02-24 10:49:25,561[INFO]: SE_loss_bbox_1: --> 841.0315
2020-02-24 10:49:25,561[INFO]: CE_loss_score_2: --> 641.2268
2020-02-24 10:49:25,561[INFO]: SE_loss_bbox_2: --> 744.3536
2020-02-24 10:49:25,561[INFO]: CE_loss_score_3: --> 434.3332
2020-02-24 10:49:25,561[INFO]: SE_loss_bbox_3: --> 499.9199
2020-02-24 10:49:33,380[INFO]: Iter[266880] -- Time elapsed: 7.8 s. Speed: 40.9 images/s.
2020-02-24 10:49:33,380[INFO]: CE_loss_score_0: --> 2104.8516
2020-02-24 10:49:33,380[INFO]: SE_loss_bbox_0: --> 1421.9438
2020-02-24 10:49:33,380[INFO]: CE_loss_score_1: --> 867.3072
2020-02-24 10:49:33,380[INFO]: SE_loss_bbox_1: --> 726.9958
2020-02-24 10:49:33,380[INFO]: CE_loss_score_2: --> 646.7043
2020-02-24 10:49:33,380[INFO]: SE_loss_bbox_2: --> 700.9400
2020-02-24 10:49:33,590[INFO]: CE_loss_score_3: --> 478.2961
2020-02-24 10:49:33,595[INFO]: SE_loss_bbox_3: --> 397.7205
2020-02-24 10:49:38,488[INFO]: Iter[266890] -- Time elapsed: 4.9 s. Speed: 65.4 images/s.
2020-02-24 10:49:38,489[INFO]: CE_loss_score_0: --> 2104.8079
2020-02-24 10:49:38,489[INFO]: SE_loss_bbox_0: --> 1506.2745
2020-02-24 10:49:38,489[INFO]: CE_loss_score_1: --> 870.7725
2020-02-24 10:49:38,489[INFO]: SE_loss_bbox_1: --> 803.8165
2020-02-24 10:49:38,489[INFO]: CE_loss_score_2: --> 551.4744
2020-02-24 10:49:38,489[INFO]: SE_loss_bbox_2: --> 645.9541
2020-02-24 10:49:38,490[INFO]: CE_loss_score_3: --> 459.2909
2020-02-24 10:49:38,490[INFO]: SE_loss_bbox_3: --> 390.1395
2020-02-24 10:49:47,531[INFO]: Iter[266900] -- Time elapsed: 9.0 s. Speed: 35.4 images/s.
2020-02-24 10:49:47,531[INFO]: CE_loss_score_0: --> 2091.7856
2020-02-24 10:49:47,531[INFO]: SE_loss_bbox_0: --> 1387.1002
2020-02-24 10:49:47,531[INFO]: CE_loss_score_1: --> 861.7832
2020-02-24 10:49:47,532[INFO]: SE_loss_bbox_1: --> 895.0710
2020-02-24 10:49:47,532[INFO]: CE_loss_score_2: --> 607.0239
2020-02-24 10:49:47,532[INFO]: SE_loss_bbox_2: --> 788.6065
2020-02-24 10:49:47,532[INFO]: CE_loss_score_3: --> 519.2036
2020-02-24 10:49:47,532[INFO]: SE_loss_bbox_3: --> 575.0079
我也是
@GitEasonXu 训练人脸的时候,loss下降的趋势是,高层的loss最先开始下降,然后逐步到第一层。第一层下降的时候,大概要训练10万次迭代以后(batchsize=32)。你这个loss没有下降太明显的话,是否考虑dataiter里面,在annotation到target的转换过程有差错。
@YonghaoHe 谢谢您的回复,针对你提出的问题,我好好看看。 训练annotation 格式是
negative_image_path,0
positive_image_path,1,box_num,x1,y1,w,h,.....
我在训练时每个branch loss相比开始也都下降了,只是训练了40w轮后(lr:0.0001)loss 就不再下降了,特别低层loss是2000多感觉有点大,不知这样是正常还是说配置不正确导致loss不降低。
2020-02-25 16:00:39,028[INFO]: Iter[437270] -- Time elapsed: 16.2 s. Speed: 19.7 images/s.
2020-02-25 16:00:39,028[INFO]: CE_loss_score_0: --> 2037.9876
2020-02-25 16:00:39,028[INFO]: SE_loss_bbox_0: --> 1315.7551
2020-02-25 16:00:39,028[INFO]: CE_loss_score_1: --> 860.6035
2020-02-25 16:00:39,028[INFO]: SE_loss_bbox_1: --> 756.8625
2020-02-25 16:00:39,028[INFO]: CE_loss_score_2: --> 613.7824
2020-02-25 16:00:39,028[INFO]: SE_loss_bbox_2: --> 528.0763
2020-02-25 16:00:39,028[INFO]: CE_loss_score_3: --> 462.5813
2020-02-25 16:00:39,028[INFO]: SE_loss_bbox_3: --> 502.4318
我基本也 是这样,不过我试了一下30000轮的模型,可以正常检测
@GitEasonXu 我知道为啥你不降了...起始学习率是0.1,然后逐步降低到0.001。你的学习率基本学不了,原地打转。wd设置成0.0001。
@YonghaoHe OK,谢谢您的建议,我尝试一下你说的学习策略。
我基本也 是这样,不过我试了一下30000轮的模型,可以正常检测
不影响使用,作者论文中提出模型没有BN层,建议多训练几轮。
Yonghao,你好: 请教下,我用keras进行复现,训练过程中reg_loss_bbox会下降,但是clc_loss_score一直降不下来,可能会是什么原因呢?