PaddlePaddle / PaddleSlim

PaddleSlim is an open-source library for deep model compression and architecture search.
https://paddleslim.readthedocs.io/zh_CN/latest/
Apache License 2.0
1.56k stars 345 forks source link

solov2自动压缩失败 #1422

Closed hf62580 closed 9 months ago

hf62580 commented 2 years ago

环境: PaddleDetection 2.5 PaddleSlim 2.3.4 ubuntu 20.04 cuda 11.1 用PaddleDetection训练solov2,之后导出模型,导出命令如下 python tools/export_model.py -c configs/solov2/solov2_r50_enhance_symhua_coco.yml --output_dir=./inference_model -o weights=resume/best_model 然后根据部署的auto_compression的文档说明,配置了相关的文件,之后执行 python run.py --config_path=./configs/solov2_qat_dis.yaml --save_dir='./output/' 报错 (PPSolov2) root@hof-System-Product-Name:/mnt/myDisk/deepLearn/PaddleDetection/PaddleDetection/deploy/auto_compression# python run.py --config_path=./configs/solov2_qat_dis.yaml --save_dir='./output/' ----------- Running Arguments ----------- Distillation: alpha: 1.0 loss: soft_label Global: Evaluation: True input_list: ['image', 'scale_factor'] model_dir: ./solov2_r50_enhance_symhua_coco model_filename: model.pdmodel params_filename: model.pdiparams reader_config: configs/solov2_reader.yml Quantization: activation_quantize_type: moving_average_abs_max quantize_op_types: ['conv2d', 'depthwise_conv2d'] use_pact: True TrainConfig: eval_iter: 1000 learning_rate: T_max: 6000 learning_rate: 0.00125 type: CosineAnnealingDecay optimizer_builder: optimizer: type: SGD weight_decay: 4e-05 train_iter: 5000

loading annotations into memory... Done (t=0.01s) creating index... index created! W0917 10:08:10.872013 3000434 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 11.7, Runtime API Version: 11.1 W0917 10:08:10.875550 3000434 gpu_resources.cc:91] device: 0, cuDNN Version: 8.0. loading annotations into memory... Done (t=0.00s) creating index... index created! 2022-09-17 10:08:14,268-INFO: devices: gpu 2022-09-17 10:08:30,781-INFO: Detect model type: None 2022-09-17 10:08:31,491-INFO: Selected strategies: ['qat_dis'] Traceback (most recent call last): File "run.py", line 183, in main() File "run.py", line 173, in main ac.compress() File "/root/anaconda3/envs/PPSolov2/lib/python3.7/site-packages/paddleslim/auto_compression/compressor.py", line 569, in compress train_config) File "/root/anaconda3/envs/PPSolov2/lib/python3.7/site-packages/paddleslim/auto_compression/compressor.py", line 704, in single_strategy_compress default_distill_node_pair, strategy, config, train_config) File "/root/anaconda3/envs/PPSolov2/lib/python3.7/site-packages/paddleslim/auto_compression/compressor.py", line 493, in _prepare_program default_distill_node_pair=default_distill_node_pair) File "/root/anaconda3/envs/PPSolov2/lib/python3.7/site-packages/paddleslim/auto_compression/create_compressed_program.py", line 273, in build_distill_program feed_target_names=feed_target_names) File "/root/anaconda3/envs/PPSolov2/lib/python3.7/site-packages/paddleslim/auto_compression/create_compressed_program.py", line 197, in _load_program_and_merge merge_feed=merge_feed) File "/root/anaconda3/envs/PPSolov2/lib/python3.7/site-packages/paddleslim/dist/single_distiller.py", line 73, in merge teacher_var.name, new_name) File "/root/anaconda3/envs/PPSolov2/lib/python3.7/site-packages/paddle/fluid/framework.py", line 3456, in _rename_var raise ValueError("var %s is not in current block" % name) ValueError: var top_k_v2_0.tmp_1 is not in current block

solov2_qat_dis.yaml的配置如下

Global: reader_config: configs/solov2_reader.yml input_list: ['image', 'scale_factor'] Evaluation: True model_dir: ./solov2_r50_enhance_symhua_coco model_filename: model.pdmodel params_filename: model.pdiparams

Distillation: alpha: 1.0 loss: soft_label

Quantization: use_pact: true activation_quantize_type: 'moving_average_abs_max' quantize_op_types:

TrainConfig: train_iter: 5000 eval_iter: 1000 learning_rate:
type: CosineAnnealingDecay learning_rate: 0.00125 T_max: 6000 optimizer_builder: optimizer: type: SGD weight_decay: 4.0e-05

solov2_reader.yml的配置如下 metric: COCO num_classes: 7

Datset configuration

TrainDataset: !COCODataSet image_dir: train2017 anno_path: annotations/instances_train2017.json dataset_dir: data/coco/

EvalDataset: !COCODataSet image_dir: val2017 anno_path: annotations/instances_val2017.json dataset_dir: data/coco/

worker_num: 2

preprocess reader in test

EvalReader: sample_transforms:

ceci3 commented 2 years ago

你好,导出的模型里是不是有nms?在导出模型的时候可以去掉nms,参考:https://github.com/PaddlePaddle/PaddleDetection/blob/ede22043927a944bb4cbea0e9455dd9c91b295f0/deploy/third_engine/demo_openvino/python/README.md#%E7%9C%9F%E5%AE%9E%E5%9B%BE%E7%89%87%E6%B5%8B%E8%AF%95%E7%BD%91%E7%BB%9C%E5%8C%85%E5%90%AB%E5%90%8E%E5%A4%84%E7%90%86%E4%BD%86%E4%B8%8D%E5%8C%85%E5%90%ABnms

hf62580 commented 2 years ago

python tools/export_model.py -c configs/solov2/solov2_r50_enhance_symhua_coco.yml --output_dir=./inference_model -o weights=resume/best_model export.nms=False 重新导了一下,使用 export.nms=False,还是报同样的错 ----------- Running Arguments ----------- Distillation: alpha: 1.0 loss: soft_label Global: Evaluation: True input_list: ['image', 'scale_factor'] model_dir: ./solov2_r50_enhance_symhua_coco model_filename: model.pdmodel params_filename: model.pdiparams reader_config: configs/solov2_reader.yml Quantization: activation_quantize_type: moving_average_abs_max quantize_op_types: ['conv2d', 'depthwise_conv2d'] use_pact: True TrainConfig: eval_iter: 1000 learning_rate: T_max: 6000 learning_rate: 0.00125 type: CosineAnnealingDecay optimizer_builder: optimizer: momentum: 0.9 type: Momentum regularizer: factor: 0.0001 type: L2 train_iter: 5000

loading annotations into memory... Done (t=0.00s) creating index... index created! W0919 15:46:25.759570 3261306 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 11.7, Runtime API Version: 11.1 W0919 15:46:25.762647 3261306 gpu_resources.cc:91] device: 0, cuDNN Version: 8.0. loading annotations into memory... Done (t=0.00s) creating index... index created! 2022-09-19 15:46:26,786-INFO: devices: gpu 2022-09-19 15:46:40,404-INFO: Detect model type: None 2022-09-19 15:46:40,914-INFO: Selected strategies: ['qat_dis'] Traceback (most recent call last): File "run.py", line 183, in main() File "run.py", line 173, in main ac.compress() File "/root/anaconda3/envs/PPSolov2/lib/python3.7/site-packages/paddleslim/auto_compression/compressor.py", line 569, in compress train_config) File "/root/anaconda3/envs/PPSolov2/lib/python3.7/site-packages/paddleslim/auto_compression/compressor.py", line 704, in single_strategy_compress default_distill_node_pair, strategy, config, train_config) File "/root/anaconda3/envs/PPSolov2/lib/python3.7/site-packages/paddleslim/auto_compression/compressor.py", line 493, in _prepare_program default_distill_node_pair=default_distill_node_pair) File "/root/anaconda3/envs/PPSolov2/lib/python3.7/site-packages/paddleslim/auto_compression/create_compressed_program.py", line 273, in build_distill_program feed_target_names=feed_target_names) File "/root/anaconda3/envs/PPSolov2/lib/python3.7/site-packages/paddleslim/auto_compression/create_compressed_program.py", line 197, in _load_program_and_merge merge_feed=merge_feed) File "/root/anaconda3/envs/PPSolov2/lib/python3.7/site-packages/paddleslim/dist/single_distiller.py", line 73, in merge teacher_var.name, new_name) File "/root/anaconda3/envs/PPSolov2/lib/python3.7/site-packages/paddle/fluid/framework.py", line 3456, in _rename_var raise ValueError("var %s is not in current block" % name) ValueError: var top_k_v2_0.tmp_1 is not in current block

ceci3 commented 2 years ago

可以把.pdmodel模型发一下,我看下模型结构~

hf62580 commented 2 years ago

好的,以下是下载链接 链接:https://pan.baidu.com/s/19Me_oadFXnArQYhBT0EWKA 提取码:ei3y

ceci3 commented 2 years ago

我问了下相关同学,solov2不能去掉后处理,那可以先手动设置下蒸馏的 node为所有conv2d的输出,我们后续优化下这个模型的自动压缩功能~

hf62580 commented 2 years ago

我问了下相关同学,solov2不能去掉后处理,那可以先手动设置下蒸馏的 node为所有conv2d的输出,我们后续优化下这个模型的自动压缩功能~

如何手工设置蒸馏的 node为所有conv2d的输出?有参考吗?对这一块不熟悉,抱歉!

yghstill commented 2 years ago

目前solov2不支持量化蒸馏训练,原因是模型中含有控制流,merge蒸馏program和增加backward节点有问题,可以先在自动压缩中尝试离线量化超参搜索策略。

dijiupianhai9 commented 1 year ago

solov2在自动压缩中使用离线量化超参搜索策略报了这样的错

/home/jumper/anaconda3/envs/paddet/lib/python3.8/site-packages/paddle/fluid/framework.py:594: UserWarning: You are using GPU version Paddle, but your CUDA device is not set properly. CPU device will be used by default. warnings.warn( /home/jumper/anaconda3/envs/paddet/lib/python3.8/site-packages/_distutils_hack/init.py:33: UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.") Fri Dec 16 15:52:12-WARNING: The old way to load inference model is deprecated. model path: /home/jumper/cjl/paddledetection/PaddleSlim-develop/inference_model/solov2_r50_enhance_coco/model.pdmodel, params path: /home/jumper/cjl/paddledetection/PaddleSlim-develop/inference_model/solov2_r50_enhance_coco/model.pdiparams WARNING: Detect dataset only contains single fileds, return format changed since Paddle 2.1. In Paddle <= 2.0, DataLoader add a list surround output data(e.g. return [data]), and in Paddle >= 2.1, DataLoader return the single filed directly (e.g. return data). For example, in following code:

import numpy as np from paddle.io import DataLoader, Dataset

class RandomDataset(Dataset): def getitem(self, idx): data = np.random.random((2, 3)).astype('float32')

    return data

def __len__(self):
    return 10

dataset = RandomDataset() loader = DataLoader(dataset, batch_size=1) data = next(loader())

In Paddle <= 2.0, data is in format '[Tensor(shape=(1, 2, 3), dtype=float32)]', and in Paddle >= 2.1, data is in format 'Tensor(shape=(1, 2, 3), dtype=float32)'

2022-12-16 15:52:18,473-INFO: devices: cpu Fri Dec 16 15:52:18-WARNING: The old way to load inference model is deprecated. model path: /home/jumper/cjl/paddledetection/PaddleSlim-develop/inference_model/solov2_r50_enhance_coco/model.pdmodel, params path: /home/jumper/cjl/paddledetection/PaddleSlim-develop/inference_model/solov2_r50_enhance_coco/model.pdiparams 2022-12-16 15:53:01,524-INFO: Detect model type: None Fri Dec 16 15:53:01-WARNING: The old way to load inference model is deprecated. model path: /home/jumper/cjl/paddledetection/PaddleSlim-develop/inference_model/solov2_r50_enhance_coco/model.pdmodel, params path: /home/jumper/cjl/paddledetection/PaddleSlim-develop/inference_model/solov2_r50_enhance_coco/model.pdiparams Fri Dec 16 15:53:07-WARNING: The old way to load inference model is deprecated. model path: /home/jumper/cjl/paddledetection/PaddleSlim-develop/inference_model/solov2_r50_enhance_coco/model.pdmodel, params path: /home/jumper/cjl/paddledetection/PaddleSlim-develop/inference_model/solov2_r50_enhance_coco/model.pdiparams 2022-12-16 15:53:12,320-INFO: Selected strategies: ['ptq_hpo'] 122222strategy ['ptq_hpo'] Fri Dec 16 15:53:12-INFO: Load model and set data loader ... Fri Dec 16 15:53:12-INFO: Collect quantized variable names ... Sampling stage, Run batch:| | 0/5/home/jumper/anaconda3/envs/paddet/lib/python3.8/site-packages/paddle/fluid/executor.py:1145: UserWarning: The variable inputs is not found in program. It is not declared or is pruned. Sampling stage, Run batch:| | 0/5 'feed_targets' does not have scale_factor variable Traceback (most recent call last): File "/home/jumper/anaconda3/envs/paddet/lib/python3.8/site-packages/smac/tae/execute_func.py", line 217, in run rval = self._call_ta(self._ta, config, obj_kwargs) File "/home/jumper/anaconda3/envs/paddet/lib/python3.8/site-packages/smac/tae/execute_func.py", line 314, in _call_ta return obj(config, *obj_kwargs) File "/home/jumper/anaconda3/envs/paddet/lib/python3.8/site-packages/paddleslim/quant/post_quant_hpo.py", line 283, in quantize quant_post( \ File "/home/jumper/anaconda3/envs/paddet/lib/python3.8/site-packages/paddleslim/quant/quanter.py", line 506, in quant_post_static post_training_quantization.quantize() File "/home/jumper/anaconda3/envs/paddet/lib/python3.8/site-packages/paddle/fluid/contrib/slim/quantization/post_training_quantization.py", line 415, in quantize self._executor.run(program=self._program, File "/home/jumper/anaconda3/envs/paddet/lib/python3.8/site-packages/paddle/fluid/executor.py", line 1463, in run six.reraise(sys.exc_info()) File "/home/jumper/anaconda3/envs/paddet/lib/python3.8/site-packages/six.py", line 719, in reraise raise value File "/home/jumper/anaconda3/envs/paddet/lib/python3.8/site-packages/paddle/fluid/executor.py", line 1450, in run res = self._run_impl(program=program, File "/home/jumper/anaconda3/envs/paddet/lib/python3.8/site-packages/paddle/fluid/executor.py", line 1639, in _run_impl program, new_exe = self._executor_cache.get_program_and_executor( File "/home/jumper/anaconda3/envs/paddet/lib/python3.8/site-packages/paddle/fluid/executor.py", line 743, in get_program_and_executor return self._get_cached_program_and_executor( File "/home/jumper/anaconda3/envs/paddet/lib/python3.8/site-packages/paddle/fluid/executor.py", line 782, in _get_program_and_executor program = _add_feed_fetch_ops(program=inner_program, File "/home/jumper/anaconda3/envs/paddet/lib/python3.8/site-packages/paddle/fluid/executor.py", line 367, in _add_feed_fetch_ops if not has_feed_operators(global_block, feed, feed_var_name): File "/home/jumper/anaconda3/envs/paddet/lib/python3.8/site-packages/paddle/fluid/executor.py", line 284, in has_feed_operators raise Exception( Exception: 'feed_targets' does not have scale_factor variable 2022-12-16 15:53:13,023-INFO: Value for default configuration: 2147483647.00000000 Fri Dec 16 15:53:13-INFO: Load model and set data loader ... Fri Dec 16 15:53:13-INFO: Collect quantized variable names ... Sampling stage, Run batch:| | 0/5/home/jumper/anaconda3/envs/paddet/lib/python3.8/site-packages/paddle/fluid/executor.py:1145: UserWarning: The variable inputs is not found in program. It is not declared or is pruned. Sampling stage, Run batch:| | 0/5 'feed_targets' does not have scale_factor variable Traceback (most recent call last): File "/home/jumper/anaconda3/envs/paddet/lib/python3.8/site-packages/smac/tae/execute_func.py", line 217, in run rval = self._call_ta(self._ta, config, obj_kwargs) File "/home/jumper/anaconda3/envs/paddet/lib/python3.8/site-packages/smac/tae/execute_func.py", line 314, in _call_ta return obj(config, *obj_kwargs) File "/home/jumper/anaconda3/envs/paddet/lib/python3.8/site-packages/paddleslim/quant/post_quant_hpo.py", line 283, in quantize quant_post( \ File "/home/jumper/anaconda3/envs/paddet/lib/python3.8/site-packages/paddleslim/quant/quanter.py", line 506, in quant_post_static post_training_quantization.quantize() File "/home/jumper/anaconda3/envs/paddet/lib/python3.8/site-packages/paddle/fluid/contrib/slim/quantization/post_training_quantization.py", line 415, in quantize self._executor.run(program=self._program, File "/home/jumper/anaconda3/envs/paddet/lib/python3.8/site-packages/paddle/fluid/executor.py", line 1463, in run six.reraise(sys.exc_info()) File "/home/jumper/anaconda3/envs/paddet/lib/python3.8/site-packages/six.py", line 719, in reraise raise value File "/home/jumper/anaconda3/envs/paddet/lib/python3.8/site-packages/paddle/fluid/executor.py", line 1450, in run res = self._run_impl(program=program, File "/home/jumper/anaconda3/envs/paddet/lib/python3.8/site-packages/paddle/fluid/executor.py", line 1639, in _run_impl program, new_exe = self._executor_cache.get_program_and_executor( File "/home/jumper/anaconda3/envs/paddet/lib/python3.8/site-packages/paddle/fluid/executor.py", line 743, in get_program_and_executor return self._get_cached_program_and_executor( File "/home/jumper/anaconda3/envs/paddet/lib/python3.8/site-packages/paddle/fluid/executor.py", line 782, in _get_program_and_executor program = _add_feed_fetch_ops(program=inner_program, File "/home/jumper/anaconda3/envs/paddet/lib/python3.8/site-packages/paddle/fluid/executor.py", line 367, in _add_feed_fetch_ops if not has_feed_operators(global_block, feed, feed_var_name): File "/home/jumper/anaconda3/envs/paddet/lib/python3.8/site-packages/paddle/fluid/executor.py", line 284, in has_feed_operators raise Exception( Exception: 'feed_targets' does not have scale_factor variable Traceback (most recent call last): File "run.py", line 67, in ac.compress() File "/home/jumper/anaconda3/envs/paddet/lib/python3.8/site-packages/paddleslim/auto_compression/compressor.py", line 575, in compress self.single_strategy_compress(strategy, config, strategy_idx, File "/home/jumper/anaconda3/envs/paddet/lib/python3.8/site-packages/paddleslim/auto_compression/compressor.py", line 638, in single_strategy_compress post_quant_hpo.quant_post_hpo( File "/home/jumper/anaconda3/envs/paddet/lib/python3.8/site-packages/paddleslim/quant/post_quant_hpo.py", line 526, in quant_post_hpo incumbent = smac.optimize() File "/home/jumper/anaconda3/envs/paddet/lib/python3.8/site-packages/smac/facade/smac_ac_facade.py", line 723, in optimize incumbent = self.solver.run() File "/home/jumper/anaconda3/envs/paddet/lib/python3.8/site-packages/smac/optimizer/smbo.py", line 307, in run self._incorporate_run_results(run_info, result, time_left) File "/home/jumper/anaconda3/envs/paddet/lib/python3.8/site-packages/smac/optimizer/smbo.py", line 510, in _incorporate_run_results raise FirstRunCrashedException( smac.tae.FirstRunCrashedException: First run crashed, abort. Please check your setup -- we assume that your default configuration does not crashes. (To deactivate this exception, use the SMAC scenario option 'abort_on_first_run_crash'). Additional run info: {}

麻烦看一下这是什么原因导致的

dijiupianhai9 commented 1 year ago

'feed_targets' does not have scale_factor variable