Open lucasjinreal opened 5 years ago
did you load the pre-train weight? it works fine with my dataset
or maybe you didn't change the mode is train or test in the config file
@jinfagang Have you solved the problem? I have the same issue.
@1453042287 I trained the yolov2-mobilenet-v2 from stratch. U mentioned 'pre-trained model', do y mean the pre-trained bone network model (such as the mobilenetv2) or both bone model and detection model? In my training, all the parameters are not pre trained.
@blueardour first, make sure you change the PHASE in .yml file to 'train', then ,actually, i believe it's inappropriate to train a model from scratch, so at least, you should load the pre-train backbone, i just utilize the whole pre-train weight(including backbone and extract and so on..) the author provided, but i set the RESUME_SCOPE in the .yml file to be 'base' only and the resault is almost the same as fine-tune's
@1453042287 Hi, thanks for the advise. My current training seems working. In my previous training, I set 'base' and 'loc' so on all in the trainable_scope, and it does not give a good result. After only reload the 'base' and retrain other parameters, I successfully recover the precision.
My only problem left is the speed for test. The nms in the test procedure seems very slow. It have been discussed in https://github.com/ShuangXieIrene/ssds.pytorch/issues/16. Yet no good solutions.
@1453042287 Hi, thanks for the advise. My current training seems working. In my previous training, I set 'base' and 'loc' so on all in the trainable_scope, and it does not give a good result. After only reload the 'base' and retrain other parameters, I successfully recover the precision.
My only problem left is the speed for test. The nms in the test procedure seems very slow. It have been discussed in #16. Yet no good solutions.
@blueardour Hi,bellow is my test result of fssd_mobilenet_v2 on coco2017 using my config files instead of the given one. training from scratch without any pre-trained model. Shall i only reload the 'base' paras here?
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.211
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.358
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.217
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.044
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.234
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.351
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.216
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.343
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.371
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.099
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.428
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.590
ok...seems like training from scratch might not be well supported. But i just want to use this repo to verify my network arch, and imagenet pre-trained model is still on training.
Yes, set all parameter to re-trainable seems hard to converge. This year, Mr He did publish a paper named 'Rethinking ImageNet Pre-training' which claimed the pre-train on imagenet is not necessary. However, it is skillful to give a good initialization of the network.
Yes, set all parameter to re-trainable seems hard to converge. This year, Mr He did publish a paper named 'Rethinking ImageNet Pre-training' which claimed the pre-train on imagenet is not necessary. However, it is skillful to give a good initialization of the network.
Yes, agree with you. I read that paper the day it is published. My own designed network outperform(imagenet/cifar...) several networks, however, the imagenet training is still going on(72.5 1.0). Also i have verified my network on other tasks and works fine, so i believe it will get better result on detection&&segmentation task too. Personally, i greatly agree with views from "Detnet" and "rethinking imagenet pre-training", however, seems like that much more computation cost and specific tuning skills are needed. Before my imagenet training finished, i will have to compare sdd performance based on models trained from scratch firstly.
Hi, @1453042287 @cvtower
I have another issue about the train precision and loss curve. The following is the result from tensorboardX.
It can be see that the precision slowly increase and meet a jump at around 89th epoch. I don't why the precision changes so dramatically at this point. The loc and cls loss as well the learning rate seem not change so much. Do you observe a similar phenomenon or do you have any explanation on it?
Hi, @1453042287 @cvtower
I have another issue about the train precision and loss curve. The following is the result from tensorboardX.
It can be see that the precision slowly increase and meet a jump at around 89th epoch. I don't why the precision changes so dramatically at this point. The loc and cls loss as well the learning rate seem not change so much. Do you observe a similar phenomenon or do you have any explanation on it?
Hi @blueardour,
I did not use the CosineAnnealing LR and no such phenomenon ever happened during training.
您好,我想请问下:作者提供的pre-train weight文件,你是如何得到的,我没有weight目录,所以也没有预训练权重文件,还是您通过其他方式获得的?谢谢您! @1453042287
@XiaSunny 下载啊。。。就在这个repo的readme里面,蓝体字
@1453042287 好的,谢谢你。
您好,我用的配置文件是fssd_vgg16_train_coco.yml,当我训练coco2017时conf_loss在5左右,loc_loss在2左右,一直不下去。我的配置文件如下: MODEL: SSDS: fssd NETS: vgg16 IMAGE_SIZE: [300, 300] NUM_CLASSES: 81 FEATURE_LAYER: [[[22, 34, 'S'], [512, 1024, 512]], [['', 'S', 'S', 'S', '', ''], [512, 512, 256, 256, 256, 256]]] STEPS: [[8, 8], [16, 16], [32, 32], [64, 64], [100, 100], [300, 300]] SIZES: [[30, 30], [60, 60], [111, 111], [162, 162], [213, 213], [264, 264], [315, 315]] ASPECT_RATIOS: [[1, 2, 3], [1, 2, 3], [1, 2, 3], [1, 2, 3], [1, 2], [1, 2]]
TRAIN: MAX_EPOCHS: 500 CHECKPOINTS_EPOCHS: 1 BATCH_SIZE: 28 TRAINABLE_SCOPE: 'norm,extras,transforms,pyramids,loc,conf' RESUME_SCOPE: 'base' OPTIMIZER: OPTIMIZER: sgd LEARNING_RATE: 0.001 MOMENTUM: 0.9 WEIGHT_DECAY: 0.0001 LR_SCHEDULER: SCHEDULER: SGDR WARM_UP_EPOCHS: 150
TEST: BATCH_SIZE: 64 TEST_SCOPE: [90, 100]
MATCHER: MATCHED_THRESHOLD: 0.5 UNMATCHED_THRESHOLD: 0.5 NEGPOS_RATIO: 3
POST_PROCESS: SCORE_THRESHOLD: 0.01 IOU_THRESHOLD: 0.6 MAX_DETECTIONS: 100
DATASET: DATASET: 'coco' DATASET_DIR: '/home/chase/Downloads/ssds.pytorch-master/data/coco' TRAIN_SETS: [['2017', 'train']] TEST_SETS: [['2017', 'val']] PROB: 0.6
EXP_DIR: './experiments/models/fssd_vgg16_coco' LOG_DIR: './experiments/models/fssd_vgg16_coco' RESUME_CHECKPOINT: '/home/chase/Downloads/ssds.pytorch-master/weight/vgg16_fssd_coco_27.2.pth' PHASE: ['train'] 另外,我还试了 RESUME_CHECKPOINT:vgg16_reducedfc.pth,但是效果差不多。这个问题困扰我很长时间了,我不知道怎么回事,希望你能指点一下 @1453042287 @blueardour @cvtower
您好,我用的配置文件是fssd_vgg16_train_coco.yml,当我训练coco2017时conf_loss在5左右,loc_loss在2左右,一直不下去。我的配置文件如下: MODEL: SSDS: fssd NETS: vgg16 IMAGE_SIZE: [300, 300] NUM_CLASSES: 81 FEATURE_LAYER: [[[22, 34, 'S'], [512, 1024, 512]], [['', 'S', 'S', 'S', '', ''], [512, 512, 256, 256, 256, 256]]] STEPS: [[8, 8], [16, 16], [32, 32], [64, 64], [100, 100], [300, 300]] SIZES: [[30, 30], [60, 60], [111, 111], [162, 162], [213, 213], [264, 264], [315, 315]] ASPECT_RATIOS: [[1, 2, 3], [1, 2, 3], [1, 2, 3], [1, 2, 3], [1, 2], [1, 2]]
TRAIN: MAX_EPOCHS: 500 CHECKPOINTS_EPOCHS: 1 BATCH_SIZE: 28 TRAINABLE_SCOPE: 'norm,extras,transforms,pyramids,loc,conf' RESUME_SCOPE: 'base' OPTIMIZER: OPTIMIZER: sgd LEARNING_RATE: 0.001 MOMENTUM: 0.9 WEIGHT_DECAY: 0.0001 LR_SCHEDULER: SCHEDULER: SGDR WARM_UP_EPOCHS: 150
TEST: BATCH_SIZE: 64 TEST_SCOPE: [90, 100]
MATCHER: MATCHED_THRESHOLD: 0.5 UNMATCHED_THRESHOLD: 0.5 NEGPOS_RATIO: 3
POST_PROCESS: SCORE_THRESHOLD: 0.01 IOU_THRESHOLD: 0.6 MAX_DETECTIONS: 100
DATASET: DATASET: 'coco' DATASET_DIR: '/home/chase/Downloads/ssds.pytorch-master/data/coco' TRAIN_SETS: [['2017', 'train']] TEST_SETS: [['2017', 'val']] PROB: 0.6
EXP_DIR: './experiments/models/fssd_vgg16_coco' LOG_DIR: './experiments/models/fssd_vgg16_coco' RESUME_CHECKPOINT: '/home/chase/Downloads/ssds.pytorch-master/weight/vgg16_fssd_coco_27.2.pth' PHASE: ['train'] 另外,我还试了 RESUME_CHECKPOINT:vgg16_reducedfc.pth,但是效果差不多。这个问题困扰我很长时间了,我不知道怎么回事,希望你能指点一下 @1453042287 @blueardour @cvtower
@XiaSunny 你好,我也遇到了你这个问题,请问你解决了吗
@1453042287 @XiaSunny 你好,我想使用预训练模型
TRAINABLE_SCOPE: 'base,norm,extras,loc,conf' RESUME_SCOPE: 'base,norm,extras,loc,conf' 这里面的参数我应该如何修改? 谢谢!
TRAINABLE_SCOPE指的是需要训练的范围RESUME_SCOPE指的是你需要从预训练模型中恢复的有哪些,首先应该把conf去掉(因为类别数不一样)其他的你根据实际情况看看还需要改不。发自我的华为手机-------- 原始邮件 --------发件人: Damon2019 notifications@github.com日期: 2019年9月18日周三 11:31收件人: "ShuangXieIrene/ssds.pytorch" ssds.pytorch@noreply.github.com抄送: XiaSunny lxjghxq@sina.com, Mention mention@noreply.github.com主 题: Re: [ShuangXieIrene/ssds.pytorch] Loss is not decreasing (#43)@1453042287 @XiaSunny 你好,我想使用预训练模型
TRAINABLE_SCOPE: 'base,norm,extras,loc,conf'
RESUME_SCOPE: 'base,norm,extras,loc,conf'
这里面的参数我应该如何修改? 谢谢!
—You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub, or mute the thread.
您好,我用的配置文件是fssd_vgg16_train_coco.yml,当我训练coco2017时conf_loss在5左右,loc_loss在2左右,一直不下去。我的配置文件如下: MODEL: SSDS: fssd NETS: vgg16 IMAGE_SIZE: [300, 300] NUM_CLASSES: 81 FEATURE_LAYER: [[[22, 34, 'S'], [512, 1024, 512]], [['', 'S', 'S', 'S', '', ''], [512, 512, 256, 256, 256, 256]]] STEPS: [[8, 8], [16, 16], [32, 32], [64, 64], [100, 100], [300, 300]] SIZES: [[30, 30], [60, 60], [111, 111], [162, 162], [213, 213], [264, 264], [315, 315]] ASPECT_RATIOS: [[1, 2, 3], [1, 2, 3], [1, 2, 3], [1, 2, 3], [1, 2], [1, 2]]
TRAIN: MAX_EPOCHS: 500 CHECKPOINTS_EPOCHS: 1 BATCH_SIZE: 28 TRAINABLE_SCOPE: 'norm,extras,transforms,pyramids,loc,conf' RESUME_SCOPE: 'base' OPTIMIZER: OPTIMIZER: sgd LEARNING_RATE: 0.001 MOMENTUM: 0.9 WEIGHT_DECAY: 0.0001 LR_SCHEDULER: SCHEDULER: SGDR WARM_UP_EPOCHS: 150
TEST: BATCH_SIZE: 64 TEST_SCOPE: [90, 100]
MATCHER: MATCHED_THRESHOLD: 0.5 UNMATCHED_THRESHOLD: 0.5 NEGPOS_RATIO: 3
POST_PROCESS: SCORE_THRESHOLD: 0.01 IOU_THRESHOLD: 0.6 MAX_DETECTIONS: 100
DATASET: DATASET: 'coco' DATASET_DIR: '/home/chase/Downloads/ssds.pytorch-master/data/coco' TRAIN_SETS: [['2017', 'train']] TEST_SETS: [['2017', 'val']] PROB: 0.6
EXP_DIR: './experiments/models/fssd_vgg16_coco' LOG_DIR: './experiments/models/fssd_vgg16_coco' RESUME_CHECKPOINT: '/home/chase/Downloads/ssds.pytorch-master/weight/vgg16_fssd_coco_27.2.pth' PHASE: ['train'] 另外,我还试了 RESUME_CHECKPOINT:vgg16_reducedfc.pth,但是效果差不多。这个问题困扰我很长时间了,我不知道怎么回事,希望你能指点一下 @1453042287 @blueardour @cvtower
你好,我最近训练也遇到loss不下降的问题,一直维持在4左右,下载的模型,没做任何修改,只是重新加载base进行训练,求问你最终是如何解决的,万分感谢~
I have trained ssd with mobilenetv2 on VOC but after almost 500 epochs, the loss is still like this:
It's doesn't change and loss is very hight...... What's the problem with implementation?