Question about Yolo Nano model training

Zmjcc commented 4 years ago

In your previous discussion, I learned that it is better to train the pre-model through IMAGENET and then train on VOC, but I want to use the VOC dataset to roughly test the effect of this model. But when you use your model to train and test with eval.py, all classes have an AP of 0. Don't you know that you have encountered it? Or have you gotten a better pre-model through IMAGENET?

david8862 commented 4 years ago

You can refer the result in closed issue #6 from @johnjunjun7. I'm trying to train the nano backbone on imagenet but not finished due to other works.

In your previous discussion, I learned that it is better to train the pre-model through IMAGENET and then train on VOC, but I want to use the VOC dataset to roughly test the effect of this model. But when you use your model to train and test with eval.py, all classes have an AP of 0. Don't you know that you have encountered it? Or have you gotten a better pre-model through IMAGENET?

johnjunjun7 commented 4 years ago

I train directly on VOC2007, and the mAP is about 20，not zero

Zmjcc commented 4 years ago

I had a operation error previously, now I am retraining. After training 180 epoches, I converted the model and evaluated it with eval.py. The mAP was 44.63, but it has not converged yet. I don't know if anyone has trained on Image-Net to get a pre-model?

johnjunjun7 commented 4 years ago

I pre trained nano backbone network on Imagenet (the current training to top-5 accuracy is 74%)

After adding the pre training weights to the nano, the mAP of trained_final.h5 only about 32 (set the conf ﹣ threshold of eval.py to 0.5). When set the conf-threshold = 0.3, the result of mAP is about 36.

Without using the pre training weight, the result of directly training on VOC is about 16，much worse than you did

Maybe there's something wrong with my training parameters? my parameters: --model_type=yolov3-nano
--anchors_path=configs/yolo3_anchors.txt --model_image_size=416X416 --weights_path=yolo_nano_preweight.h5
--annotation_file='tools/2007_train.txt' --val_annotation_file='tools/2007_val.txt' --classes_path='configs/voc_classes.txt' --batch_size=32 --learning_rate=0.001 --cosine_decay_learning_rate=True --init_epoch=20 --total_epoch=250 --multiscale=False other parameters is default. Is there anything different from you？ and what's your parameters of train and eval

david8862 commented 4 years ago

For imagenet pretrained YOLO nano backbone, you may need to change the frozen layer number in transfer training stage here to the nanonet layer number, since currently I didn't set it.

I pre trained nano backbone network on Imagenet (the current training to top-5 accuracy is 74%)

After adding the pre training weights to the nano, the mAP of trained_final.h5 only about 32 (set the conf ﹣ threshold of eval.py to 0.5). When set the conf-threshold = 0.3, the result of mAP is about 36.

Without using the pre training weight, the result of directly training on VOC is about 16，much worse than you did

Maybe there's something wrong with my training parameters? my parameters: --model_type=yolov3-nano --anchors_path=configs/yolo3_anchors.txt --model_image_size=416X416 --weights_path=yolo_nano_preweight.h5 --annotation_file='tools/2007_train.txt' --val_annotation_file='tools/2007_val.txt' --classes_path='configs/voc_classes.txt' --batch_size=32 --learning_rate=0.001 --cosine_decay_learning_rate=True --init_epoch=20 --total_epoch=250 --multiscale=False other parameters is default. Is there anything different from you？ and what's your parameters of train and eval

johnjunjun7 commented 4 years ago

对于imagenet预训练的YOLO纳米主干，您可能需要在此处的转移训练阶段将冻结层数更改为nanonet层数，因为当前我没有设置它。

我在Imagenet上预先训练了纳米骨干网（当前对前5位准确性的训练是74％）将预训练权重添加到nano后，trained_final.h5的mAP仅约为32（将eval.py的conf ﹣阈值设置为0.5）。设置conf-threshold = 0.3时，mAP的结果约为36。不使用预训练权重，直接对VOC进行训练的结果约为16，比您做的要差得多我的训练参数可能有问题吗？我的参数：-- model_type = yolov3-nano --anchors_path = configs / yolo3_anchors.txt --model_image_size = 416X416 --weights_path = yolo_nano_preweight.h5 --annotation_file ='tools / 2007_train.txt'-- val_annotation_file ='tools / 2007_val .txt'-- classes_path ='configs / voc_classes.txt'-- batch_size = 32 --learning_rate = 0.001 --cosine_decay_learning_rate = True --init_epoch = 20 --total_epoch = 250 --multiscale = False 其他参数是默认值。你和你有什么不同吗？你的训练和评估参数是什么？

Does the number of network layer include BN layer and relu layer? According to the backbone network in your source code, the frozen layer number should be what？

david8862 commented 4 years ago

对于imagenet预训练的YOLO纳米主干，您可能需要在此处的转移训练阶段将冻结层数更改为nanonet层数，因为当前我没有设置它。

我在Imagenet上预先训练了纳米骨干网（当前对前5位准确性的训练是74％）将预训练权重添加到nano后，trained_final.h5的mAP仅约为32（将eval.py的conf ﹣阈值设置为0.5）。设置conf-threshold = 0.3时，mAP的结果约为36。不使用预训练权重，直接对VOC进行训练的结果约为16，比您做的要差得多我的训练参数可能有问题吗？我的参数：-- model_type = yolov3-nano --anchors_path = configs / yolo3_anchors.txt --model_image_size = 416X416 --weights_path = yolo_nano_preweight.h5 --annotation_file ='tools / 2007_train.txt'-- val_annotation_file ='tools / 2007_val .txt'-- classes_path ='configs / voc_classes.txt'-- batch_size = 32 --learning_rate = 0.001 --cosine_decay_learning_rate = True --init_epoch = 20 --total_epoch = 250 --multiscale = False 其他参数是默认值。你和你有什么不同吗？你的训练和评估参数是什么？

Does the number of network layer include BN layer and relu layer? According to the backbone network in your source code, the frozen layer number should be what？

For current implementation the backbone length should be 269. You can simply check it by printing out "len(model.layers)" in train_imagenet.py and wipe off the tail layers.

johnjunjun7 commented 4 years ago

I use the following code to deal with the weights trained by Imagenet,

base_model = load_model('/data/b14c757f950445d3ae628f07e2e36a2b/pkgs/pre_final_ep42.h5') resnet_model = Model(inputs=base_model.input, outputs=base_model.get_layer('Conv_pw_3_relu').output)’ print(resnet_model.summary()) resnet_model.save_weights('my_model_weights.h5')

Then, I use this code load weights:

model_body. Load_weights (weights_path, by_name = True)

Maybe those can achieve the same effect?

david8862 commented 4 years ago

I use the following code to deal with the weights trained by Imagenet,

base_model = load_model('/data/b14c757f950445d3ae628f07e2e36a2b/pkgs/pre_final_ep42.h5') resnet_model = Model(inputs=base_model.input, outputs=base_model.get_layer('Conv_pw_3_relu').output)’ print(resnet_model.summary()) resnet_model.save_weights('my_model_weights.h5')

Then, I use this code load weights:

model_body. Load_weights (weights_path, by_name = True)

Maybe those can achieve the same effect?

Yes, that's correct for loading the pretrained weights. And for transfer learning a common further practise is freezing the well pretrained part for some epochs to train the random initialized part first, and then free the whole network for fine tune. You can refer related comment here

johnjunjun7 commented 4 years ago

I use the following code to deal with the weights trained by Imagenet, base_model = load_model('/data/b14c757f950445d3ae628f07e2e36a2b/pkgs/pre_final_ep42.h5') resnet_model = Model(inputs=base_model.input, outputs=base_model.get_layer('Conv_pw_3_relu').output)’ print(resnet_model.summary()) resnet_model.save_weights('my_model_weights.h5') Then, I use this code load weights: model_body. Load_weights (weights_path, by_name = True) Maybe those can achieve the same effect?

Yes, that's correct for loading the pretrained weights. And for transfer learning a common further practise is freezing the well pretrained part for some epochs to train the random initialized part first, and then free the whole network for fine tune. You can refer related comment here

many Thanks, I'm training, Looks like it's going to work a lot better

Zmjcc commented 4 years ago

I am going to try to get a pre-trained model on the coco dataset, but using train.py training, I have trained the following 1000 images and I have the following errors many times: File "train.py", line 282, in _main(args) File "train.py", line 188, in _main callbacks=callbacks) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/keras/engine/training.py", line 1433, in fit_generator steps_name='steps_per_epoch') File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/keras/engine/training_generator.py", line 220, in model_iteration batch_data = _get_next_batch(generator, mode) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/keras/engine/training_generator.py", line 362, in _get_next_batch generator_output = next(generator) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/keras/utils/data_utils.py", line 918, in get six.reraise(sys.exc_info()) File "/usr/lib/python3/dist-packages/six.py", line 686, in reraise raise value File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/keras/utils/data_utils.py", line 894, in get inputs = self.queue.get(block=True).get() File "/usr/lib/python3.5/multiprocessing/pool.py", line 608, in get raise self._value File "/usr/lib/python3.5/multiprocessing/pool.py", line 119, in worker result = (True, func(args, **kwds)) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/keras/utils/data_utils.py", line 828, in next_sample return six.next(_SHARED_SEQUENCES[uid]) File "/home/undergraduate/folder1/Desktop/keras-YOLOv3-model-set-master/yolo3/data.py", line 276, in yolo3_data_generator image, box = get_random_data(annotation_lines[i], input_shape, random=True) File "/home/undergraduate/folder1/Desktop/keras-YOLOv3-model-set-master/yolo3/data.py", line 87, in get_random_data image = image.resize((nw,nh), Image.BICUBIC) File "/usr/local/lib/python3.5/dist-packages/PIL/Image.py", line 1763, in resize self.load() File "/usr/local/lib/python3.5/dist-packages/PIL/ImageFile.py", line 232, in load "(%d bytes not processed)" % len(b)) OSError: image file is truncated (7 bytes not processed)

david8862 commented 4 years ago

From error log it seems the image is corrupted. You can try to print out the image file name to check the file content or show it in code with "image.show()"

I am going to try to get a pre-trained model on the coco dataset, but using train.py training, I have trained the following 1000 images and I have the following errors many times: File "train.py", line 282, in _main(args) File "train.py", line 188, in _main callbacks=callbacks) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/keras/engine/training.py", line 1433, in fit_generator steps_name='steps_per_epoch') File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/keras/engine/training_generator.py", line 220, in model_iteration batch_data = _get_next_batch(generator, mode) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/keras/engine/training_generator.py", line 362, in _get_next_batch generator_output = next(generator) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/keras/utils/data_utils.py", line 918, in get six.reraise(sys.exc_info()) File "/usr/lib/python3/dist-packages/six.py", line 686, in reraise raise value File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/keras/utils/data_utils.py", line 894, in get inputs = self.queue.get(block=True).get() File "/usr/lib/python3.5/multiprocessing/pool.py", line 608, in get raise self._value File "/usr/lib/python3.5/multiprocessing/pool.py", line 119, in worker result = (True, func(args, **kwds)) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/keras/utils/data_utils.py", line 828, in next_sample return six.next(_SHARED_SEQUENCES[uid]) File "/home/undergraduate/folder1/Desktop/keras-YOLOv3-model-set-master/yolo3/data.py", line 276, in yolo3_data_generator image, box = get_random_data(annotation_lines[i], input_shape, random=True) File "/home/undergraduate/folder1/Desktop/keras-YOLOv3-model-set-master/yolo3/data.py", line 87, in get_random_data image = image.resize((nw,nh), Image.BICUBIC) File "/usr/local/lib/python3.5/dist-packages/PIL/Image.py", line 1763, in resize self.load() File "/usr/local/lib/python3.5/dist-packages/PIL/ImageFile.py", line 232, in load "(%d bytes not processed)" % len(b)) OSError: image file is truncated (7 bytes not processed)

johnjunjun7 commented 4 years ago

How to calculate the FPS of the model in keras? I'd like to see the calculation speed of the model. :)

david8862 commented 4 years ago

How to calculate the FPS of the model in keras? I'd like to see the calculation speed of the model. :)

you can use validate_yolo.py. It will run inference for several times and show the average time cost.

johnjunjun7 commented 4 years ago

Now there is a new calculation method of IOU, which can improve the convergence speed and effect of yolov3. It can be achieved by simply changing the calculation formula of IOU loss function. You can try.

Here is the reference link： https://cloud.tencent.com/developer/article/1558533 https://arxiv.org/pdf/1911.08287.pdf

david8862 commented 4 years ago

Now there is a new calculation method of IOU, which can improve the convergence speed and effect of yolov3. It can be achieved by simply changing the calculation formula of IOU loss function. You can try.

Here is the reference link： https://cloud.tencent.com/developer/article/1558533 https://arxiv.org/pdf/1911.08287.pdf

Many thanks. I'm now working on other tasks and will try to pick it up later.

david8862 commented 4 years ago

Now there is a new calculation method of IOU, which can improve the convergence speed and effect of yolov3. It can be achieved by simply changing the calculation formula of IOU loss function. You can try.

Here is the reference link： https://cloud.tencent.com/developer/article/1558533 https://arxiv.org/pdf/1911.08287.pdf

Hi @johnjunjun7, I've just draft implemented the DIoU loss & DIoU NMS (with numpy) for YOLOv3 model set and had a try of DIoU NMS on existing pretrained weights. Seems the DIoU NMS could really slightly improve the mAP for all models. Related code has been merged, and I'll move on with the DIoU loss for training. Thanks again for the useful info.

david8862 / keras-YOLOv3-model-set

Question about Yolo Nano model training #8