Extremely high quantization time after training

Hello,

Appreciate your work, it works amazing. I'm facing with an issue which I'd like to ask.

I can train my model on my GPU, really fast, without any problem (for my own configuration, it takes approximately 20 seconds for an epoch to finish). However, quantization process takes extremely long (more than 20 mins). After that, evaluating the quantized model phase takes even longer (more than 30 mins). Therefore, for a 20 epoch training: train phase takes approximately 4 mins where the other processes takes almost an hour in total.

Here are the configs I use:

general:
  project_name: trial_1
  logs_dir: logs
  saved_models_dir: saved_models

train_parameters:
  batch_size: 64
  training_epochs: 20
  optimizer: adam
  initial_learning: 0.001
  learning_rate_scheduler: reducelronplateau

dataset:
  name: dataset
  class_names: [person, vehicle]
  training_path: datasets/dataset
  validation_path:
  test_path: 

pre_processing:
  rescaling: {scale : 127.5, offset : -1}
  resizing: nearest
  aspect_ratio: False
  color_mode: rgb

data_augmentation:
  RandomFlip: horizontal_and_vertical
  RandomTranslation: [0.1, 0.1]
  RandomRotation: 0.2
  RandomZoom: 0.2
  RandomContrast: 0.2
  RandomBrightness: 0.4
  RandomShear: False

model:
  model_type: {name : mobilenet, version : v2, alpha : 0.5}
  input_shape: [160, 160, 3]
  transfer_learning : True
  dropout: 0.5

quantization:
  quantize: True
  evaluate: True
  quantizer: TFlite_converter
  quantization_type: PTQ
  quantization_input_type: int8
  quantization_output_type: int8
  export_dir: quantized_models

stm32ai:
  optimization: balanced
  footprints_on_target: STM32H747I-DISCO
  path_to_stm32ai: C:/en.x-cube-ai-windows_v7.3.0/windows/stm32ai.exe

mlflow:
  uri: ./mlruns

hydra:
  run:
    dir: outputs/${now:%Y_%m_%d_%H_%M_%S}

I have 2 GPUs. GPU_0 is used for the training, but it does not free up the memory after the training. Here is the GPU usages while quantizing the model: Here, GPU_0's usage is the same as the usage in the train phase, and GPU_1 is not even being used by the script at all.

What can I do to reduce this quantization time? As far as I know, this should take at most 6-7 mins.

Thanks a lot.

STMicroelectronics / stm32ai-modelzoo

Extremely high quantization time after training #7