Object detection - How to use transfer learning ?

cypamigon commented 1 week ago

Hello,

I'm training an object detection model based on a custom dataset and I'm not sure about the way to use transfer learning (i.e. fine tune an already existing model with our data).

I'm following the instructions provided in the README of the object_detection/src folder in order to configure the user_config.yaml file but I don't really understand the difference between general.model_pathand training.pretrained_weights parameters. When both are set, where does the initial weights come from ?

I'm training a model to detect coffee cup, just to try the process. According to my observations, when no model_path are provided the initial loss is over 300 ! And when I set a model_path from the model zoo, the initial loss is just around 3.

Here is my user_config.yaml file :

general:
  project_name: Cup_Detection
  model_type: ssd_mobilenet_v2_fpnlite
  model_path: ../pretrained_models/ssd_mobilenet_v2_fpnlite/ST_pretrainedmodel_public_dataset/coco_2017_person/ssd_mobilenet_v2_fpnlite_035_416/ssd_mobilenet_v2_fpnlite_035_416.h5 #../pretrained_models/ssd_mobilenet_v2_fpnlite/ST_pretrainedmodel_public_dataset/coco_2017_person/ssd_mobilenet_v2_fpnlite_035_416/ssd_mobilenet_v2_fpnlite_035_416_int8.tflite
  logs_dir: logs
  saved_models_dir: saved_models
  gpu_memory_limit: 16
  global_seed: 127

operation_mode: chain_tqe
#choices=['training' , 'evaluation', 'deployment', 'quantization', 'benchmarking',
#        'chain_tqeb','chain_tqe','chain_eqe','chain_qb','chain_eqeb','chain_qd ']

dataset:
  name: custom_cup_dataset
  class_names: [ cup ]
  training_path: ../datasets/cup_images_dataset/train
  validation_path: ../datasets/cup_images_dataset/val
  test_path: ../datasets/cup_images_dataset/test
  quantization_path:
  quantization_split: 0.3

preprocessing:
  rescaling: { scale: 1/127.5, offset: -1 }
  resizing:
    aspect_ratio: fit
    interpolation: nearest
  color_mode: rgb

data_augmentation:
  rotation: 30
  shearing: 15
  translation: 0.1
  vertical_flip: 0.5
  horizontal_flip: 0.2
  gaussian_blur: 3.0
  linear_contrast: [ 0.75, 1.5 ]

training:
  model:
    alpha: 0.35
    input_shape: (416, 416, 3)
    pretrained_weights: imagenet
  dropout:
  batch_size: 64
  epochs: 5000
  optimizer:
    Adam:
      learning_rate: 0.001
  callbacks:
    ReduceLROnPlateau:
      monitor: val_loss
      patience: 20
    EarlyStopping:
      monitor: val_loss
      patience: 40

postprocessing:
  confidence_thresh: 0.6
  NMS_thresh: 0.5
  IoU_eval_thresh: 0.3
  plot_metrics: True   # Plot precision versus recall curves. Default is False.
  max_detection_boxes: 10

quantization:
  quantizer: TFlite_converter
  quantization_type: PTQ
  quantization_input_type: float
  quantization_output_type: uint8
  export_dir: quantized_models

benchmarking:
  board: STM32H747I-DISCO

tools:
  stm32ai:
    version: 8.1.0
    optimization: balanced
    on_cloud: True
    path_to_stm32ai: C:/Users/<XXXXX>/STM32Cube/Repository/Packs/STMicroelectronics/X-CUBE-AI/<*.*.*>/Utilities/windows/stm32ai.exe
  path_to_cubeIDE: C:/ST/STM32CubeIDE_1.10.1/STM32CubeIDE/stm32cubeide.exe

deployment:
  c_project_path: ../../stm32ai_application_code/object_detection/
  IDE: GCC
  verbosity: 1 n
  hardware_setup:
    serie: STM32H7
    board: STM32H747I-DISCO

mlflow:
  uri: ./experiments_outputs/mlruns

hydra:
  run:
    dir: ./experiments_outputs/${now:%Y_%m_%d_%H_%M_%S}

LFOSTM commented 1 week ago

Hello, Actually general.model_path and training.model should be mutually exclusive (as it is the case for Image Classification). We will fix it in future updates.

RSERSTM commented 1 week ago

Hello,

When we say training.pretrained_weights : imagenet, it relates to the weigths of the backbone. This backbone, is trained for classification tasks by Google on a big dataset called imagenet. The detection model is made out of this backbone and a head initialized with random weights. So if you put training.pretrained_weights : None, you will have a backbone and a head initialized with random weights.

Now if you use for example our ssd_mobilenet_v2_fpnlite_035_416.h5 model in general.model_path, all the weigths (backbone + head) of this model will be used. Meaning that if in your dataset, you have exactly the same number of classes to detect (only one in this case) you can use it and it will be a good starting point for fine tuning. But if the model is detecting a different number of classes than in your dataset then the training will not be possible.

Thanks,

cypamigon commented 1 week ago

Hello,

When we say training.pretrained_weights : imagenet, it relates to the weigths of the backbone. This backbone, is trained for classification tasks by Google on a big dataset called imagenet. The detection model is made out of this backbone and a head initialized with random weights. So if you put training.pretrained_weights : None, you will have a backbone and a head initialized with random weights.

Now if you use for example our ssd_mobilenet_v2_fpnlite_035_416.h5 model in general.model_path, all the weigths (backbone + head) of this model will be used. Meaning that if in your dataset, you have exactly the same number of classes to detect (only one in this case) you can use it and it will be a good starting point for fine tuning. But if the model is detecting a different number of classes than in your dataset then the training will not be possible.

Thanks,

Ok, so if I understand correctly, the ssd_mobilenet_v2_fpnlite_035_416.h5 model has been trained to detect only one class (person). This is why I can use it as a starting point to build my custom model, because my dataset also has one class. If I had multiple classes in my dataset, I would not be able to use the ssd_mobilenet_v2_fpnlite_035_416.h5 model, is it correct?

But if I'm only providing training.pretrained_weights: imagenet and no model.path in the user_config.yaml file, does it behave the same way? Do I need to have a dataset with 20,000 classes (the number of classes in ImageNet according to Wikipedia) ? Or does the fact that it only loads the backbone make it behave differently?

Thanks

RSERSTM commented 1 week ago

If I had multiple classes in my dataset, I would not be able to use the ssd_mobilenet_v2_fpnlite_035_416.h5 -> yes exactly that is correct.

But if I'm only providing training.pretrained_weights: imagenet and no model.path in the user_config.yaml file, does it behave the same way? -> If you do that the loss, as you experienced, will be a little bit high at the beginning and the model will take longer to train, but the advantage is that you can train with any number of classes you desire.

Do I need to have a dataset with 20,000 classes (the number of classes in ImageNet according to Wikipedia) ? Or does the fact that it only loads the backbone make it behave differently? -> Yes exactly the fact that it only loads the backbone makes the model usable because the classification head has been removed (the few thousand classes) and has been replaced by a detection head with the number of classes defined by your detection dataset.

What seems to be your concern is that you want to load ssd_mobilenet_v2_fpnlite_035_416.h5 and put any dataset with any number of classes you want is that correct ?

Thanks,

cypamigon commented 1 week ago

Thanks for all the explanations, it's very interesting. Actually, I just wanted to understand the impact of providing the general.model_path or training.pretrained_weights parameters on the training. Now I have a better understanding of how things work. Thanks again, I think we can close the issue now

cypamigon commented 1 week ago

I still have a few questions regarding the usage of an already existing model (such as ssd_mobilenet_v2_fpnlite_035_416.h5) as a starting point for training :

How to choose between the ssd_mobilenet_v2_fpnlite_035_416.h5 and ssd_mobilenet_v2_fpnlite_035_416_int8.tflite ? The second one is just the first one that have been quantized in order to take int8 images as input, right ? If I'm choosing ssd_mobilenet_v2_fpnlite_035_416_int8.tflite instead of ssd_mobilenet_v2_fpnlite_035_416.h5, what impact will it have on my custom model ?
How do we determine the appropriate rescaling parameters (preprocessing.rescaling) when starting with an existing model ? Should we rescale our images based on the scale used by this model ? If yes, how to know the scale used ?

Thanks for your help.

LFOSTM commented 1 week ago

"How do we determine the appropriate rescaling parameters (preprocessing.rescaling) when starting with an existing model ? Should we rescale our images based on the scale used by this model ? If yes, how to know the scale used ?" Those parameters must be chosen by you to decide the dynamic of your input data during the training. For instance if you have values between 0 and 255 and prefer to scale them to -1 +1. Then the same scale factor should appear in the .tflite 1st layer. This will allow to use uint8 (0 255) input with .tflite in a transparent way during inference.

STMicroelectronics / stm32ai-modelzoo

Object detection - How to use transfer learning ? #40