PaddlePaddle / PaddleSlim

PaddleSlim is an open-source library for deep model compression and architecture search.
https://paddleslim.readthedocs.io/zh_CN/latest/
Apache License 2.0
1.56k stars 344 forks source link

自动压缩ppyoloe训练过程中马上变成nan调整学习率无用 #1531

Closed futureflsl closed 7 months ago

futureflsl commented 1 year ago

降低学习率很多个级别都不行 python run.py --config_path configs/ppyoloe_s_qat_dis.yaml ----------- Running Arguments ----------- Distillation: alpha: 1.0 loss: soft_label Global: Evaluation: True arch: PPYOLOE model_dir: ./ppyoloe_crn_s_300e_coco model_filename: model.pdmodel params_filename: model.pdiparams reader_config: configs/yolo_reader.yml Quantization: activation_quantize_type: moving_average_abs_max onnx_format: True quantize_op_types: ['conv2d', 'depthwise_conv2d'] use_pact: True TrainConfig: eval_iter: 1000 learning_rate: T_max: 6000 learning_rate: 3e-08 type: CosineAnnealingDecay optimizer_builder: optimizer: type: SGD weight_decay: 4e-05 train_iter: 5000

loading annotations into memory... Done (t=0.01s) creating index... index created! W1122 10:26:51.865729 55936 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.5, Driver API Version: 11.4, Runtime API Version: 11.1 W1122 10:26:51.868434 55936 gpu_resources.cc:91] device: 0, cuDNN Version: 8.2. loading annotations into memory... Done (t=0.00s) creating index... index created! 2022-11-22 10:26:54,445-INFO: devices: gpu 2022-11-22 10:26:57,593-INFO: Detect model type: None 2022-11-22 10:26:57,716-INFO: Selected strategies: ['qat_dis'] 2022-11-22 10:27:02,829-INFO: train config.distill_node_pair: ['teacher_conv2d_173.tmp_1', 'conv2d_173.tmp_1', 'teacher_conv2d_177.tmp_0', 'conv2d_177.tmp_0', 'teacher_conv2d_180.tmp_1', 'conv2d_180.tmp_1', 'teacher_conv2d_184.tmp_0', 'conv2d_184.tmp_0', 'teacher_conv2d_187.tmp_1', 'conv2d_187.tmp_1', 'teacher_conv2d_191.tmp_0', 'conv2d_191.tmp_0'] 2022-11-22 10:27:03,251-INFO: quant_aware config {'weight_quantize_type': 'channel_wise_abs_max', 'activation_quantize_type': 'moving_average_abs_max', 'weight_bits': 8, 'activation_bits': 8, 'not_quant_pattern': ['skip_quant'], 'quantize_op_types': ['conv2d', 'depthwise_conv2d'], 'dtype': 'int8', 'window_size': 10000, 'moving_rate': 0.9, 'for_tensorrt': False, 'is_full_quantize': False, 'onnx_format': True, 'name': 'Distillation', 'loss': 'soft_label', 'node': [], 'alpha': 1.0, 'teacher_model_dir': './ppyoloe_crn_s_300e_coco', 'teacher_model_filename': 'model.pdmodel', 'teacher_params_filename': 'model.pdiparams'} Adding quant op with weight:|██████████████████████████████████████████| 324/324 Adding OutScale op:|███████████████████████████████████████████████████| 319/319 2022-11-22 10:27:07,641-INFO: quant_aware config {'weight_quantize_type': 'channel_wise_abs_max', 'activation_quantize_type': 'moving_average_abs_max', 'weight_bits': 8, 'activation_bits': 8, 'not_quant_pattern': ['skip_quant'], 'quantize_op_types': ['conv2d', 'depthwise_conv2d'], 'dtype': 'int8', 'window_size': 10000, 'moving_rate': 0.9, 'for_tensorrt': False, 'is_full_quantize': False, 'onnx_format': True, 'name': 'Distillation', 'loss': 'soft_label', 'node': [], 'alpha': 1.0, 'teacher_model_dir': './ppyoloe_crn_s_300e_coco', 'teacher_model_filename': 'model.pdmodel', 'teacher_params_filename': 'model.pdiparams'} Adding quant op with weight:|████████████████████████████████████████| 1725/1725 Adding OutScale op:|█████████████████████████████████████████████████| 1107/1107 2022-11-22 10:29:53,033-INFO: When a preprocess_func is used in quant_aware, Need to save a mapping table to match variable names in the convert phase. 2022-11-22 10:29:53,033-INFO: The mapping table is saved as './mapping_table_for_saving_inference_model'. 2022-11-22 10:30:11,829-INFO: Total iter: 0, epoch: 0, batch: 0, loss: [13.393167 13.291335] 2022-11-22 10:30:15,995-INFO: Total iter: 10, epoch: 0, batch: 10, loss: [12.573825 12.857247] 2022-11-22 10:30:20,154-INFO: Total iter: 20, epoch: 0, batch: 20, loss: [12.493932 12.663242] 2022-11-22 10:30:24,307-INFO: Total iter: 30, epoch: 0, batch: 30, loss: [nan nan] 2022-11-22 10:30:28,446-INFO: Total iter: 40, epoch: 0, batch: 40, loss: [nan nan] 2022-11-22 10:30:32,589-INFO: Total iter: 50, epoch: 0, batch: 50, loss: [nan nan]

yghstill commented 1 year ago

@futureflsl 看你的学习率比较小了,比较奇怪,可以尝试以下修改再试下:

  1. 配置文件中use_pact设为False
  2. 改为单卡训练
  3. 确认下ppyoloe_crn_s_300e_coco这个模型不包含nms的吗?导出时设置exclude_post_process=True了没?使用官方demo是否能直接跑通
futureflsl commented 1 year ago

我试试,官方demo是能直接跑通的