Cityscapes SOTA 模型导出报错'Config' object has no attribute 'model'

Siiiiiigma commented 12 months ago

问题确认 Search before asking

[X] 我已经查询历史issue(包括open与closed)，没有发现相似的bug。I have searched the open and closed issues and found no similar bug report.

Bug描述 Describe the Bug

python export.py --config configs/mscale_ocr_cityscapes_autolabel_mapillary.yml --save_dir ./output --input_shape 1 3 2048 1024 按照readme要求在制定位置下载了模型参数和预训练参数，使用以上命令导出预训练的模型网络时，出现以下报错尝试了历史issue中提到的几种方法，例如通过源码安装开发版paddleseg，问题仍然存在利用飞浆ai studio的notebook也同样存在此问题，和配置环境应该无关

报错内容 d:\deeplearning\paddleseg\paddleseg\cvlibs\manager.py:113: UserWarning: MscaleOCRNet exists already! It is now updated to <class 'models.mscale_ocrnet.MscaleOCRNet'> !!! warnings.warn("{} exists already! It is now updated to {} !!!". Traceback (most recent call last): File "D:\DeepLearning\PaddleSeg\contrib\CityscapesSOTA\export.py", line 140, in main(args) File "D:\DeepLearning\PaddleSeg\contrib\CityscapesSOTA\export.py", line 84, in main net = cfg.model AttributeError: 'Config' object has no attribute 'model'

复现环境 Environment

paddlepaddle-gpu 2.4.2.post117 paddleseg 2.8.0 d:\deeplearning\paddleseg

Bug描述确认 Bug description confirmation

[X] 我确认已经提供了Bug复现步骤、代码改动说明、以及环境信息，确认问题是可以复现的。I confirm that the bug replication steps, code change instructions, and environment information have been provided, and the problem can be reproduced.

是否愿意提交PR？ Are you willing to submit a PR?

[ ] 我愿意提交PR！I'd like to help by submitting a PR!

Asthestarsfalll commented 12 months ago

@Siiiiiigma 你好，这应该是一个bug，问题在于CityscapesSOTA使用了paddleseg中的模块，而后续paddleseg更新时没有及时修改。可以尝试使用更早之前的版本，稍后我将会修复这个问题。

Asthestarsfalll commented 12 months ago

@Siiiiiigma 我已经提交了一个PR，你可以尝试克隆我的修改试试

Siiiiiigma commented 12 months ago

@Asthestarsfalll 感谢修复，我尝试导出第一个配置（mscale_ocr_cityscapes_autolabel_mapillary.yml）时，出现如下警告，请问是正常的吗？ (Paddle) D:\DeepLearning\PaddleSeg\contrib\CityscapesSOTA>python export.py --config configs/mscale_ocr_cityscapes_autolabel_mapillary.yml --save_dir ./output --input_shape 1 3 2048 1024 d:\deeplearning\paddleseg\paddleseg\cvlibs\manager.py:113: UserWarning: MscaleOCRNet exists already! It is now updated to <class 'models.mscale_ocrnet.MscaleOCRNet'> !!! warnings.warn("{} exists already! It is now updated to {} !!!". 2023-07-10 16:43:06 [WARNING] Add the in_channels in train_dataset class to model config. We suggest you manually set in_channels in model config. 2023-07-10 16:43:06 [INFO] Use the following config to build model model: backbone: in_channels: 3 type: HRNet_W48_NV backbone_indices:

0 n_scales: 0.5 1.0 2.0 num_classes: 19 pretrained: pretrain/pretrained.pdparams type: MscaleOCRNet W0710 16:43:06.020490 7732 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.5, Driver API Version: 12.1, Runtime API Version: 11.7 W0710 16:43:06.046422 7732 gpu_resources.cc:91] device: 0, cuDNN Version: 8.4. 2023-07-10 16:43:10 [INFO] Loading pretrained model from pretrain/pretrained.pdparams 2023-07-10 16:43:13 [WARNING] [SKIP] Shape of pretrained params ocrnet.head.cls_head.weight doesn't match.(Pretrained: (65, 512, 1, 1), Actual: [19, 512, 1, 1]) 2023-07-10 16:43:13 [WARNING] [SKIP] Shape of pretrained params ocrnet.head.cls_head.bias doesn't match.(Pretrained: (65,), Actual: [19]) 2023-07-10 16:43:13 [WARNING] [SKIP] Shape of pretrained params ocrnet.head.aux_head.1.weight doesn't match.(Pretrained: (65, 720, 1, 1), Actual: [19, 720, 1, 1]) 2023-07-10 16:43:13 [WARNING] [SKIP] Shape of pretrained params ocrnet.head.aux_head.1.bias doesn't match.(Pretrained: (65,), Actual: [19]) 2023-07-10 16:43:13 [WARNING] scale_attn.atten_head.0._conv.weight is not in pretrained model 2023-07-10 16:43:13 [WARNING] scale_attn.atten_head.0._batch_norm.weight is not in pretrained model 2023-07-10 16:43:13 [WARNING] scale_attn.atten_head.0._batch_norm.bias is not in pretrained model 2023-07-10 16:43:13 [WARNING] scale_attn.atten_head.0._batch_norm._mean is not in pretrained model 2023-07-10 16:43:13 [WARNING] scale_attn.atten_head.0._batch_norm._variance is not in pretrained model 2023-07-10 16:43:13 [WARNING] scale_attn.atten_head.1._conv.weight is not in pretrained model 2023-07-10 16:43:13 [WARNING] scale_attn.atten_head.1._batch_norm.weight is not in pretrained model 2023-07-10 16:43:13 [WARNING] scale_attn.atten_head.1._batch_norm.bias is not in pretrained model 2023-07-10 16:43:13 [WARNING] scale_attn.atten_head.1._batch_norm._mean is not in pretrained model 2023-07-10 16:43:13 [WARNING] scale_attn.atten_head.1._batch_norm._variance is not in pretrained model 2023-07-10 16:43:13 [WARNING] scale_attn.atten_head.2.weight is not in pretrained model 2023-07-10 16:43:14 [INFO] There are 1572/1587 variables loaded into MscaleOCRNet. 2023-07-10 16:43:48 [INFO] The inference model is saved in ./output

Asthestarsfalll commented 12 months ago

@Siiiiiigma 第一处警告是因为MscaleOCRNet在paddleseg.model中被注册过了，会在CityscapesSOTA重新注册一遍，没有影响。第二处pretrained params是因为线性层的权重形状不一致，预训练的head通道数和微调不一致也很正常，没有影响。第三处scale_attn的警告是因为你加载的是预训练权重，所以不存在scale_attn这个模块，deploy应该加载在下游任务训练好的权重。

Siiiiiigma commented 12 months ago

谢谢，明白了，修改为加载之前下载的saved_model/model.pdparams之后就没有警告了

Siiiiiigma commented 12 months ago

@Asthestarsfalll 你好，我想测试该模型在任意街景图上的效果，准备了一张2048*1024的JPG图像，放在image文件夹内，当我在飞桨ai studio的notebook下运行以下命令时： python deploy/python/infer.py \ --config /home/aistudio/PaddleSeg-2.6.0/output/deploy.yaml --image_path /home/aistudio/PaddleSeg-2.6.0/image --save_dir /home/aistudio/PaddleSeg-2.6.0/result

出现了如下报错： 2023-07-10 18:47:58 [INFO] Use GPU --- Running analysis [ir_graph_build_pass] I0710 18:48:00.875998 2513 executor.cc:187] Old Executor is Running. --- Running analysis [ir_analysis_pass] --- Running IR pass [map_op_to_another_pass] --- Running IR pass [identity_scale_op_clean_pass] --- Running IR pass [is_test_pass] --- Running IR pass [simplify_with_basic_ops_pass] --- Running IR pass [delete_quant_dequant_linear_op_pass] --- Running IR pass [delete_weight_dequant_linear_op_pass] --- Running IR pass [constant_folding_pass] --- Running IR pass [silu_fuse_pass] --- Running IR pass [conv_bn_fuse_pass] --- Running IR pass [conv_eltwiseadd_bn_fuse_pass] --- Running IR pass [embedding_eltwise_layernorm_fuse_pass] --- Running IR pass [multihead_matmul_fuse_pass_v2] --- Running IR pass [vit_attention_fuse_pass] --- Running IR pass [fused_multi_transformer_encoder_pass] --- Running IR pass [fused_multi_transformer_decoder_pass] --- Running IR pass [fused_multi_transformer_encoder_fuse_qkv_pass] --- Running IR pass [fused_multi_transformer_decoder_fuse_qkv_pass] --- Running IR pass [multi_devices_fused_multi_transformer_encoder_pass] --- Running IR pass [multi_devices_fused_multi_transformer_encoder_fuse_qkv_pass] --- Running IR pass [multi_devices_fused_multi_transformer_decoder_fuse_qkv_pass] --- Running IR pass [fuse_multi_transformer_layer_pass] --- Running IR pass [gpu_cpu_squeeze2_matmul_fuse_pass] --- Running IR pass [gpu_cpu_reshape2_matmul_fuse_pass] --- Running IR pass [gpu_cpu_flatten2_matmul_fuse_pass] --- Running IR pass [gpu_cpu_map_matmul_v2_to_mul_pass] --- Running IR pass [gpu_cpu_map_matmul_v2_to_matmul_pass] --- Running IR pass [matmul_scale_fuse_pass] --- Running IR pass [multihead_matmul_fuse_pass_v3] --- Running IR pass [gpu_cpu_map_matmul_to_mul_pass] --- Running IR pass [fc_fuse_pass] --- Running IR pass [fc_elementwise_layernorm_fuse_pass] --- Running IR pass [conv_elementwise_add_act_fuse_pass] --- Running IR pass [conv_elementwise_add2_act_fuse_pass] --- Running IR pass [conv_elementwise_add_fuse_pass] I0710 18:48:47.483732 2513 fuse_pass_base.cc:59] --- detected 12 subgraphs --- Running IR pass [transpose_flatten_concat_fuse_pass] --- Running IR pass [conv2d_fusion_layout_transfer_pass] --- Running IR pass [transfer_layout_elim_pass] --- Running IR pass [auto_mixed_precision_pass] --- Running IR pass [inplace_op_var_pass] I0710 18:48:47.669679 2513 fuse_pass_base.cc:59] --- detected 3 subgraphs --- Running analysis [save_optimized_model_pass] W0710 18:48:47.685402 2513 save_optimized_model_pass.cc:28] save_optim_cache_model is turned off, skip save_optimized_model_pass --- Running analysis [ir_params_sync_among_devices_pass] I0710 18:48:47.685453 2513 ir_params_sync_among_devices_pass.cc:51] Sync params from CPU to GPU --- Running analysis [adjust_cudnn_workspace_size_pass] --- Running analysis [inference_op_replace_pass] --- Running analysis [memory_optimize_pass] I0710 18:48:50.664584 2513 memory_optimize_pass.cc:222] Cluster name : shape_28.tmp_0_slice_0 size: 8 I0710 18:48:50.664654 2513 memory_optimize_pass.cc:222] Cluster name : shape_0.tmp_0_slice_0 size: 8 I0710 18:48:50.664659 2513 memory_optimize_pass.cc:222] Cluster name : concat_1.tmp_0 size: -2147483648 I0710 18:48:50.664661 2513 memory_optimize_pass.cc:222] Cluster name : transpose_0.tmp_0 size: 1073741824 I0710 18:48:50.664664 2513 memory_optimize_pass.cc:222] Cluster name : relu_78.tmp_0 size: 50331648 I0710 18:48:50.664673 2513 memory_optimize_pass.cc:222] Cluster name : batch_norm_305.tmp_2 size: 1509949440 I0710 18:48:50.664676 2513 memory_optimize_pass.cc:222] Cluster name : batch_norm_196.tmp_2 size: 50331648 I0710 18:48:50.664680 2513 memory_optimize_pass.cc:222] Cluster name : relu_227.tmp_0 size: 12582912 I0710 18:48:50.664685 2513 memory_optimize_pass.cc:222] Cluster name : batch_norm_200.tmp_2 size: 25165824 I0710 18:48:50.664688 2513 memory_optimize_pass.cc:222] Cluster name : x size: 25165824 I0710 18:48:50.664702 2513 memory_optimize_pass.cc:222] Cluster name : relu_171.tmp_0 size: 25165824 I0710 18:48:50.664711 2513 memory_optimize_pass.cc:222] Cluster name : batch_norm_930.tmp_1 size: 768 I0710 18:48:50.664716 2513 memory_optimize_pass.cc:222] Cluster name : concat_0.tmp_0 size: 1509949440 I0710 18:48:50.664718 2513 memory_optimize_pass.cc:222] Cluster name : tmp_310 size: 3145728 I0710 18:48:50.664721 2513 memory_optimize_pass.cc:222] Cluster name : bilinear_interp_v2_35.tmp_0 size: 76 --- Running analysis [ir_graph_to_program_pass] I0710 18:48:51.751169 2513 analysis_predictor.cc:1660] ======= optimize end ======= I0710 18:48:51.776242 2513 naive_executor.cc:164] --- skip [feed], feed -> x I0710 18:48:51.808507 2513 naive_executor.cc:164] --- skip [argmax_0.tmp_0], fetch -> fetch W0710 18:48:51.966293 2513 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 11.6 W0710 18:48:51.974512 2513 gpu_resources.cc:149] device: 0, cuDNN Version: 8.4. Traceback (most recent call last): File "/home/aistudio/PaddleSeg-2.6.0/deploy/python/infer.py", line 430, in main(args) File "/home/aistudio/PaddleSeg-2.6.0/deploy/python/infer.py", line 418, in main predictor.run(imgs_list) File "/home/aistudio/PaddleSeg-2.6.0/deploy/python/infer.py", line 375, in run self.predictor.run() ValueError: (InvalidArgument) The 2-th dimension of input[0] and input[1] is expected to be equal.But received input[0]'s shape = [1, 512, 1024, 512], input[1]'s shape = [1, 512, 512, 1024].

[operator < concat > error]

请问是我输入数据的形状问题吗，还是模型的问题？

Asthestarsfalll commented 12 months ago

@Siiiiiigma 应该是输入数据的形状问题

Siiiiiigma commented 12 months ago

https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.8/docs/deployment/inference/python_inference_cn.md 我使用该链接提供的cityscapes_demo.png仍然报同样的问题，感觉不像是形状的问题，是我漏了什么预处理步骤吗

Asthestarsfalll commented 12 months ago

https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.8/docs/deployment/inference/python_inference_cn.md 我使用该链接提供的cityscapes_demo.png仍然报同样的问题，感觉不像是形状的问题，是我漏了什么预处理步骤吗

看报错是模型内部concat时tensor形状不一样，使用develop分支试试呢？

Siiiiiigma commented 11 months ago

@Asthestarsfalll 我在本地使用了源码安装的开发者版本（2.8.0），以及在ai studio使用notebook提供的2.6.0版本，且均使用cityscapes_demo.png测试，该问题仍然存在，报错位置相同，请检查一下模型内部是否存在bug

PaddlePaddle / PaddleSeg