david8862 / keras-YOLOv3-model-set

end-to-end YOLOv4/v3/v2 object detection pipeline, implemented on tf.keras with different technologies
MIT License
638 stars 220 forks source link

querstion about the nano model size #5

Open silencesuper opened 4 years ago

silencesuper commented 4 years ago

the nano model is 57m,which is more bigger than tiny.Have you trained the nano model and how big is it.Thank you very much.

johnjunjun7 commented 4 years ago

you can set the save_weights_only=True
then you will find the nano model size is about 18m, but it is still much bigger than 4m.

silencesuper commented 4 years ago

Thank you very much ,i will have a try.

david8862 commented 4 years ago

There's also one unclear hyperparam expand_ratio for EP and PEP block. It's not mentioned in origin paper but will greatly impact the model size. Currently I use 2 for it (https://github.com/david8862/keras-YOLOv3-model-set/blob/master/yolo3/models/yolo3_nano.py#L147) and the model size is ~19MB. If you use 6 from standard MobileNetV2 EP block then model size will be much bigger.

johnjunjun7 commented 4 years ago

Is it possible that the authors used 16-bit Float for their model?

johnjunjun7 commented 4 years ago

The author set the weight precision to 8-bit, I don't know how to change it in the code I try to use 【K.set_floatx('float32')】,then an error has occurred

david8862 commented 4 years ago

Have you confirmed with author?

The author set the weight precision to 8-bit, I don't know how to change it in the code I try to use 【K.set_floatx('float32')】,then an error has occurred

johnjunjun7 commented 4 years ago

In this study, the indicator function 1r(·) was set up such that: i) mean average precision (mAP) ≥ 65% on VOC2007, ii) computational cost ≤ 5B operations, and iii) 8-bit weight precision. The computational cost constraint is set such that the computational cost of the resulting YOLO Nano network is below that of Tiny YOLOv3 [14], one of the most popular compact networks for embedded object detection

In the chapter 2.2 of the paper, the author mention the 8 bit weight precision

silencesuper commented 4 years ago

I test the "self.yolo_model.predict" time. It is about 20ms which is longer than yolo_tiny.But the nano model size is less than tiny.Is there something wrong with my model?

david8862 commented 4 years ago

Got it. Many thanks. If so then it should be a quantized UINT8 model which is 4x smaller than Float32. For TF the quantized model can be converted via Post-Training Integer Quantization or Quantization-aware training

In this study, the indicator function 1r(·) was set up such that: i) mean average precision (mAP) ≥ 65% on VOC2007, ii) computational cost ≤ 5B operations, and iii) 8-bit weight precision. The computational cost constraint is set such that the computational cost of the resulting YOLO Nano network is below that of Tiny YOLOv3 [14], one of the most popular compact networks for embedded object detection

In the chapter 2.2 of the paper, the author mention the 8 bit weight precision

johnjunjun7 commented 4 years ago

Got it. Many thanks. If so then it should be a quantized UINT8 model which is 4x smaller than Float32. For TF the quantized model can be converted via Post-Training Integer Quantization or Quantization-aware training

In this study, the indicator function 1r(·) was set up such that: i) mean average precision (mAP) ≥ 65% on VOC2007, ii) computational cost ≤ 5B operations, and iii) 8-bit weight precision. The computational cost constraint is set such that the computational cost of the resulting YOLO Nano network is below that of Tiny YOLOv3 [14], one of the most popular compact networks for embedded object detection

In the chapter 2.2 of the paper, the author mention the 8 bit weight precision

1、I hope you can provide Yolo_nano support for training backbone network on Imagenethttps://github.com/david8862/keras-YOLOv3-model-set/tree/master/yolo3/models/backbones 2、I tried to quantify the uint8 model of tensorflow, but it didn't succeed. If it's easy to implement, I hope you can implement it in your your module. If it's difficult, I will keep trying,many thanks!:p

david8862 commented 4 years ago

Got it. Many thanks. If so then it should be a quantized UINT8 model which is 4x smaller than Float32. For TF the quantized model can be converted via Post-Training Integer Quantization or Quantization-aware training

In this study, the indicator function 1r(·) was set up such that: i) mean average precision (mAP) ≥ 65% on VOC2007, ii) computational cost ≤ 5B operations, and iii) 8-bit weight precision. The computational cost constraint is set such that the computational cost of the resulting YOLO Nano network is below that of Tiny YOLOv3 [14], one of the most popular compact networks for embedded object detection

In the chapter 2.2 of the paper, the author mention the 8 bit weight precision

1、I hope you can provide Yolo_nano support for training backbone network on Imagenethttps://github.com/david8862/keras-YOLOv3-model-set/tree/master/yolo3/models/backbones 2、I tried to quantify the uint8 model of tensorflow, but it didn't succeed. If it's easy to implement, I hope you can implement it in your your module. If it's difficult, I will keep trying,many thanks!:p

  1. I tried to add a simple version at Training backbone, but not verified due to time cost of imagenet training. And current way of imagenet data feeding may not be efficient. I plan to improve that with new tfds package but not have enough time yet. You can start with current version first.

  2. Yes. When using Post-training integer quantization to convert yolo_nano model, following error happen:

    INFO: Initialized TensorFlow Lite runtime.
    Traceback (most recent call last):
    File "post_train_quant_convert.py", line 70, in <module>
    main()
    File "post_train_quant_convert.py", line 65, in main
    post_train_quant_convert(args.keras_model_file, args.annotation_file, args.sample_num, model_input_shape, args.output_file)
    File "post_train_quant_convert.py", line 46, in post_train_quant_convert
    tflite_model = converter.convert()
    File "/root/.virtualenvs/py3tf2/local/lib/python3.6/site-packages/tensorflow_core/lite/python/lite.py", line 450, in convert
    constants.FLOAT)
    File "/root/.virtualenvs/py3tf2/local/lib/python3.6/site-packages/tensorflow_core/lite/python/lite.py", line 239, in _calibrate_quantize_model
    inference_output_type, allow_float)
    File "/root/.virtualenvs/py3tf2/local/lib/python3.6/site-packages/tensorflow_core/lite/python/optimize/calibrator.py", line 78, in calibrate_and_quantize
    np.dtype(output_type.as_numpy_dtype()).num, allow_float)
    File "/root/.virtualenvs/py3tf2/local/lib/python3.6/site-packages/tensorflow_core/lite/python/optimize/tensorflow_lite_wrap_calibration_wrapper.py", line 115, in QuantizeModel
    return _tensorflow_lite_wrap_calibration_wrapper.CalibrationWrapper_QuantizeModel(self, input_py_type, output_py_type, allow_float)
    RuntimeError: Invalid quantization params for op RESHAPE at index 35 in subgraph 0

    Seems the RESHAPE op is not well supported by Post-training integer quantization. May need to raise issue to tensorflow model optimization team for that.

johnjunjun7 commented 4 years ago

Some mistakes have been made when I use the train_imagenet.py to train nanonet on ImageNet:

First

`Traceback (most recent call last): File "train_imagenet.py", line 182, in main(args)

File "train_imagenet.py", line 150, in main callbacks=[logging, checkpoint, lr_scheduler, terminate_on_nan])

File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py", line 1272, in fit_generator steps_name='steps_per_epoch')

File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_generator.py", line 221, in model_iteration batch_data = _get_next_batch(generator)

File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_generator.py", line 363, in _get_next_batch generator_output = next(generator)

File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/utils/data_utils.py", line 785, in get six.reraise(*sys.exc_info()) File "/usr/local/lib/python3.6/dist-packages/six.py", line 693, in reraise raise value File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/utils/data_utils.py", line 779, in get inputs = self.queue.get(block=True).get()

File "/usr/lib/python3.6/multiprocessing/pool.py", line 644, in get raise self._value

File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker result = (True, func(*args, **kwds))

File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/utils/data_utils.py", line 571, in get_index return _SHARED_SEQUENCES[uid][i]

File "/usr/local/lib/python3.6/dist-packages/keras_preprocessing/image/iterator.py", line 65, in getitem return self._get_batches_of_transformed_samples(index_array)

File "/usr/local/lib/python3.6/dist-packages/keras_preprocessing/image/iterator.py", line 239, in _get_batches_of_transformed_samples x = self.image_data_generator.standardize(x)

File "/usr/local/lib/python3.6/dist-packages/keras_preprocessing/image/image_data_generator.py", line 704, in standardize x = self.preprocessing_function(x)

File "train_imagenet.py", line 52, in preprocess x = preprocess_input(x, mode='tf')

File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/applications/init.py", line 49, in wrapper return base_fun(*args, **kwargs)

File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/applications/resnet.py", line 61, in preprocess_input return resnet.preprocess_input(*args, **kwargs)

File "/usr/local/lib/python3.6/dist-packages/keras_applications/resnet.py", line 37, in preprocess_input return imagenet_utils.preprocess_input(x, mode='caffe', kwargs)** TypeError: preprocess_input() got multiple values for keyword argument 'mode'`

*So, I use this `x /= 255.0 x -= 0.5 x = 2.0` instead, and it works.**

Second

my train dataset of ImageNet 2012 miss one class, just has 999 classes. Maybe it won't make a big difference ?

david8862 commented 4 years ago

Some mistakes have been made when I use the train_imagenet.py to train nanonet on ImageNet:

First

`Traceback (most recent call last): File "train_imagenet.py", line 182, in main(args)

File "train_imagenet.py", line 150, in main callbacks=[logging, checkpoint, lr_scheduler, terminate_on_nan])

File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py", line 1272, in fit_generator steps_name='steps_per_epoch')

File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_generator.py", line 221, in model_iteration batch_data = _get_next_batch(generator)

File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_generator.py", line 363, in _get_next_batch generator_output = next(generator)

File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/utils/data_utils.py", line 785, in get six.reraise(*sys.exc_info()) File "/usr/local/lib/python3.6/dist-packages/six.py", line 693, in reraise raise value File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/utils/data_utils.py", line 779, in get inputs = self.queue.get(block=True).get()

File "/usr/lib/python3.6/multiprocessing/pool.py", line 644, in get raise self._value

File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker result = (True, func(*args, **kwds))

File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/utils/data_utils.py", line 571, in get_index return _SHARED_SEQUENCES[uid][i]

File "/usr/local/lib/python3.6/dist-packages/keras_preprocessing/image/iterator.py", line 65, in getitem return self._get_batches_of_transformed_samples(index_array)

File "/usr/local/lib/python3.6/dist-packages/keras_preprocessing/image/iterator.py", line 239, in _get_batches_of_transformed_samples x = self.image_data_generator.standardize(x)

File "/usr/local/lib/python3.6/dist-packages/keras_preprocessing/image/image_data_generator.py", line 704, in standardize x = self.preprocessing_function(x)

File "train_imagenet.py", line 52, in preprocess x = preprocess_input(x, mode='tf')

File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/applications/init.py", line 49, in wrapper return base_fun(*args, **kwargs)

File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/applications/resnet.py", line 61, in preprocess_input return resnet.preprocess_input(*args, **kwargs)

File "/usr/local/lib/python3.6/dist-packages/keras_applications/resnet.py", line 37, in preprocess_input return imagenet_utils.preprocess_input(x, mode='caffe', kwargs)** TypeError: preprocess_input() got multiple values for keyword argument 'mode'`

*So, I use this `x /= 255.0 x -= 0.5 x = 2.0` instead, and it works.**

Second

my train dataset of ImageNet 2012 miss one class, just has 999 classes. Maybe it won't make a big difference ?

Good, thanks. I think 999 classes should have been enough for pretraining a backbone :-)

silencesuper commented 4 years ago

I change a part of the nano net and the inference time is 7.7ms. Later, I use the tf lite to quantize the model ,the model size is much smaller,but the infrence time is about 420ms.Do you know the reason,or maybe after quantization the gpu doesn't work? Many thanks

david8862 commented 4 years ago

How did you change the nano model to pass the quantization? Generally the quantized UINT8 model is mainly for embedded ARM CPU inference acceleration, since most ARM processor doesn't have powerful floating-point capability. For GPU inference Float32 model may be more fit.

I change a part of the nano net and the inference time is 7.7ms. Later, I use the tf lite to quantize the model ,the model size is much smaller,but the infrence time is about 420ms.Do you know the reason,or maybe after quantization the gpu doesn't work? Many thanks

silencesuper commented 4 years ago

Because my task is very easy,so i cut a part of the net (eg. delete all 'NanoConv2D_BN_Relu6' layer ).The accuracy of the pruning model is good and inference time on gpu is 7.7ms.After quantization, I found the tflite model 's inference time is 400ms,and gpu doesn't work.How to use the GPU backend using the TensorFlow Lite on python?Many thanks..:)

david8862 commented 4 years ago

As I know TFLite doesn't provide GPU compatibility for python runtime. The GPU delegate APIs only available under Android or IOS env.

Because my task is very easy,so i cut a part of the net (eg. delete all 'NanoConv2D_BN_Relu6' layer ).The accuracy of the pruning model is good and inference time on gpu is 7.7ms.After quantization, I found the tflite model 's inference time is 400ms,and gpu doesn't work.How to use the GPU backend using the TensorFlow Lite on python?Many thanks..:)