自动压缩报错CUDNN版本错误

truthsun22 commented 1 year ago

系统ubuntu20.01。paddle和slim均是dev版本，CUDA11.6,cudnn8.4,按照官方说明这个是匹配的，但是执行自动压缩的时候，还是报版本不匹配，这是咋回事？ 2023-06-01 15:00:54,113-INFO: devices: gpu 2023-06-01 15:01:03,250-INFO: Selected strategies: ['qat_dis'] 2023-06-01 15:01:11,724-INFO: train config.distill_node_pair: ['teacher_conv2d_305.tmp_1', 'conv2d_305.tmp_1', 'teacher_conv2d_309.tmp_0', 'conv2d_309.tmp_0', 'teacher_conv2d_312.tmp_1', 'conv2d_312.tmp_1', 'teacher_conv2d_316.tmp_0', 'conv2d_316.tmp_0', 'teacher_conv2d_319.tmp_1', 'conv2d_319.tmp_1', 'teacher_conv2d_323.tmp_0', 'conv2d_323.tmp_0'] 2023-06-01 15:01:12,023-INFO: quant_aware config {'weight_quantize_type': 'channel_wise_abs_max', 'activation_quantize_type': 'moving_average_abs_max', 'weight_bits': 8, 'activation_bits': 8, 'not_quant_pattern': ['skip_quant'], 'quantize_op_types': ['conv2d', 'depthwise_conv2d'], 'dtype': 'int8', 'window_size': 10000, 'moving_rate': 0.9, 'for_tensorrt': False, 'is_full_quantize': False, 'onnx_format': True, 'quant_post_first': False, 'scale_trainable': True, 'deploy_backend': None, 'name': 'Distillation', 'loss': 'soft_label', 'node': [], 'alpha': 1.0, 'teacher_model_dir': '/home/deepiot/ai/models/ppyoloe_plus_crn_l_80e_coco_animal8', 'teacher_model_filename': 'model.pdmodel', 'teacher_params_filename': 'model.pdiparams', 'quant_config': <paddle.static.quantization.quant_config.BaseQuantizer object at 0x7f7a74170a60>} Adding quant op with weight:|██████████████████████████████████████████| 468/468 Adding OutScale op:|███████████████████████████████████████████████████| 464/464 2023-06-01 15:01:17,119-INFO: quant_aware config {'weight_quantize_type': 'channel_wise_abs_max', 'activation_quantize_type': 'moving_average_abs_max', 'weight_bits': 8, 'activation_bits': 8, 'not_quant_pattern': ['skip_quant'], 'quantize_op_types': ['conv2d', 'depthwise_conv2d'], 'dtype': 'int8', 'window_size': 10000, 'moving_rate': 0.9, 'for_tensorrt': False, 'is_full_quantize': False, 'onnx_format': True, 'quant_post_first': False, 'scale_trainable': True, 'deploy_backend': None, 'name': 'Distillation', 'loss': 'soft_label', 'node': [], 'alpha': 1.0, 'teacher_model_dir': '/home/deepiot/ai/models/ppyoloe_plus_crn_l_80e_coco_animal8', 'teacher_model_filename': 'model.pdmodel', 'teacher_params_filename': 'model.pdiparams', 'quant_config': <paddle.static.quantization.quant_config.BaseQuantizer object at 0x7f7a4c427460>} Adding quant op with weight:|████████████████████████████████████████| 2476/2476 Adding OutScale op:|█████████████████████████████████████████████████| 2230/2230 I0601 15:05:05.469944 35715 interpreter_util.cc:518] Standalone Executor is Used. Traceback (most recent call last): File "/home/deepiot/ai/PaddleSlim/example/auto_compression/detection/run.py", line 198, in main() File "/home/deepiot/ai/PaddleSlim/example/auto_compression/detection/run.py", line 188, in main ac.compress() File "/home/deepiot/anaconda3/envs/paddle_slim/lib/python3.9/site-packages/paddleslim-0.0.0.dev0-py3.9.egg/paddleslim/auto_compression/compressor.py", line 586, in compress File "/home/deepiot/anaconda3/envs/paddle_slim/lib/python3.9/site-packages/paddleslim-0.0.0.dev0-py3.9.egg/paddleslim/auto_compression/compressor.py", line 780, in single_strategy_compress File "/home/deepiot/anaconda3/envs/paddle_slim/lib/python3.9/site-packages/paddleslim-0.0.0.dev0-py3.9.egg/paddleslim/auto_compression/compressor.py", line 799, in _start_train File "/home/deepiot/anaconda3/envs/paddle_slim/lib/python3.9/site-packages/paddle/fluid/executor.py", line 1426, in run res = self._run_impl( File "/home/deepiot/anaconda3/envs/paddle_slim/lib/python3.9/site-packages/paddle/fluid/executor.py", line 1658, in _run_impl ret = new_exe.run( File "/home/deepiot/anaconda3/envs/paddle_slim/lib/python3.9/site-packages/paddle/fluid/executor.py", line 654, in run tensors = self._new_exe.run( OSError: (External) CUDNN error(14), CUDNN_STATUS_VERSION_MISMATCH. [Hint: Please search for the error code(14) on website (https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnStatus_t) to get Nvidia's official solution and advice about CUDNN Error.] (at ../paddle/phi/kernels/gpudnn/pool_grad_kernel.cu:284) [operator < pool2d_grad > error]

mufeng12399 commented 9 months ago

我遇到的类似的问题，把cudnn降到7+，cuda用的10.2，python3.8解决的

ceci3 commented 7 months ago

可以先尝试下 import paddle; paddle.utils.run_check() 这个指令可以成功嘛？

PaddlePaddle / PaddleSlim

自动压缩报错CUDNN版本错误 #1757