bubbliiiing / mask-rcnn-tf2

这是一个mask-rcnn-tf2的库,可以用于训练自己的模型。
MIT License
64 stars 9 forks source link

请问这个错误是什么原因啊 #30

Open yasaorder opened 1 year ago

yasaorder commented 1 year ago

Layer #464 (named "mrcnn_bbox_fc"), weight <tf.Variable 'mrcnn_bbox_fc/kernel:0' shape=(1024, 12) dtype=float32, numpy= array([[-0.05428203, -0.02379247, 0.05667973, ..., -0.06447191, -0.02742182, 0.00107304], [ 0.01232199, -0.02978487, -0.02579215, ..., 0.02193074, -0.068812 , -0.04887809], [ 0.02010278, 0.01346438, -0.04218667, ..., -0.02126509, -0.06186843, -0.0097234 ], ..., [ 0.0622025 , -0.03266411, -0.01797799, ..., -0.07131057, 0.01170394, 0.00496624], [ 0.00119929, -0.02760401, 0.07242227, ..., 0.04025252, -0.0588662 , -0.06650373], [-0.07064345, 0.02222669, 0.00726445, ..., -0.06496349, 0.04810742, -0.03629954]], dtype=float32)> has shape (1024, 12), but the saved weight has shape (1024, 16). gtx3050,cuda11.1和cudnn11.1,tensorflow2.5.0,python3.7的环境运行的

bubbliiiing commented 1 year ago

0 0没见过,你怎么出啦ide

yasaorder commented 1 year ago

2023-03-08 17:33:13.790670: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll 2023-03-08 17:33:15.860437: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library nvcuda.dll 2023-03-08 17:33:15.877565: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 3050 Laptop GPU computeCapability: 8.6 coreClock: 1.5GHz coreCount: 16 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: 178.84GiB/s 2023-03-08 17:33:15.877910: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll 2023-03-08 17:33:15.931477: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublas64_11.dll 2023-03-08 17:33:15.931604: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublasLt64_11.dll 2023-03-08 17:33:15.935170: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cufft64_10.dll 2023-03-08 17:33:15.936344: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library curand64_10.dll 2023-03-08 17:33:15.944361: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusolver64_11.dll 2023-03-08 17:33:15.947159: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusparse64_11.dll 2023-03-08 17:33:15.947904: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudnn64_8.dll 2023-03-08 17:33:15.948061: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0

Configurations: BACKBONE_STRIDES [4, 8, 16, 32, 64] BATCH_SIZE 1 BBOX_STD_DEV [0.1 0.1 0.2 0.2] DETECTION_MAX_INSTANCES 100 DETECTION_MIN_CONFIDENCE 0.8 DETECTION_NMS_THRESHOLD 0.9 FPN_CLASSIF_FC_LAYERS_SIZE 1024 GPU_COUNT 1 IMAGES_PER_GPU 1 IMAGE_MAX_DIM 256 IMAGE_META_SIZE 14 IMAGE_SHAPE [256 256 3] LOSS_WEIGHTS {'rpn_class_loss': 1.0, 'rpn_bbox_loss': 1.0, 'mrcnn_class_loss': 1.0, 'mrcnn_bbox_loss': 1.0, 'mrcnn_mask_loss': 1.0} MASK_POOL_SIZE 14 MASK_SHAPE [28, 28] MAX_GT_INSTANCES 100 MINI_MASK_SHAPE (56, 56) NUM_CLASSES 2 POOL_SIZE 7 POST_NMS_ROIS_INFERENCE 1000 POST_NMS_ROIS_TRAINING 2000 PRE_NMS_LIMIT 6000 ROI_POSITIVE_RATIO 0.33 RPN_ANCHOR_RATIOS [0.5, 1, 2] RPN_ANCHOR_SCALES [32, 64, 128, 256, 512] RPN_ANCHOR_STRIDE 1 RPN_BBOX_STD_DEV [0.1 0.1 0.2 0.2] RPN_NMS_THRESHOLD 0.7 RPN_TRAIN_ANCHORS_PER_IMAGE 256 TOP_DOWN_PYRAMID_SIZE 256 TRAIN_BN False TRAIN_ROIS_PER_IMAGE 10 USE_MINI_MASK True WEIGHT_DECAY 0

2023-03-08 17:33:15.969169: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-03-08 17:33:15.970197: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 3050 Laptop GPU computeCapability: 8.6 coreClock: 1.5GHz coreCount: 16 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: 178.84GiB/s 2023-03-08 17:33:15.970371: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0 2023-03-08 17:33:16.472129: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix: 2023-03-08 17:33:16.472265: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 0 2023-03-08 17:33:16.472332: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0: N 2023-03-08 17:33:16.472620: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1667 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3050 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6) WARNING:tensorflow:From D:\pycharmprogram\mask-rcnn-tf2\venv\lib\site-packages\tensorflow\python\ops\array_ops.py:5049: calling gather (from tensorflow.python.ops.array_ops) with validate_indices is deprecated and will be removed in a future version. Instructions for updating: The validate_indices argument has no effect. Indices are always validated on CPU and never validated on GPU. WARNING:tensorflow:From D:\pycharmprogram\mask-rcnn-tf2\venv\lib\site-packages\tensorflow\python\util\deprecation.py:602: calling map_fn_v2 (from tensorflow.python.ops.map_fn) with dtype is deprecated and will be removed in a future version. Instructions for updating: Use fn_output_signature instead Traceback (most recent call last): File "D:/pycharmprogram/mask-rcnn-tf2/predict.py", line 18, in mask_rcnn = MASK_RCNN() File "D:\pycharmprogram\mask-rcnn-tf2\mask_rcnn.py", line 106, in init self.generate() File "D:\pycharmprogram\mask-rcnn-tf2\mask_rcnn.py", line 119, in generate self.model.load_weights(self.model_path, by_name=True) File "D:\pycharmprogram\mask-rcnn-tf2\venv\lib\site-packages\tensorflow\python\keras\engine\training.py", line 2324, in load_weights f, self.layers, skip_mismatch=skip_mismatch) File "D:\pycharmprogram\mask-rcnn-tf2\venv\lib\site-packages\tensorflow\python\keras\saving\hdf5_format.py", line 795, in load_weights_from_hdf5_group_by_name str(weight_values[i].shape) + '.') ValueError: Layer #389 (named "mrcnn_bbox_fc"), weight <tf.Variable 'mrcnn_bbox_fc/kernel:0' shape=(1024, 8) dtype=float32, numpy= array([[ 0.01597914, -0.05456199, -0.07526528, ..., 0.07273479, 0.00868265, 0.02716637], [ 0.07172772, -0.04769863, -0.00772104, ..., 0.02604937, -0.03637744, 0.07384741], [-0.0544251 , -0.06597053, -0.03896903, ..., -0.0512475 , -0.0349538 , 0.01046421], ..., [ 0.06541412, -0.02186713, 0.03231685, ..., 0.06121826, -0.03497282, 0.0584868 ], [-0.03125934, -0.06917947, -0.02298298, ..., -0.05074164, 0.04947589, -0.05265208], [ 0.01530369, -0.02414635, -0.04038272, ..., -0.06305762, 0.06060681, -0.03200496]], dtype=float32)> has shape (1024, 8), but the saved weight has shape (1024, 324).

Process finished with exit code 1 我用的是pycharm,昨天把问题打错了,抱歉,tensorflow是2.2.0,之前试了一下2.5.0的没有成功

Sokkafan commented 1 year ago

Configurations: BACKBONE_STRIDES [4, 8, 16, 32, 64] BATCH_SIZE 1 BBOX_STD_DEV [0.1 0.1 0.2 0.2] DETECTION_MAX_INSTANCES 100 DETECTION_MIN_CONFIDENCE 0.5 DETECTION_NMS_THRESHOLD 0.3 FPN_CLASSIF_FC_LAYERS_SIZE 1024 GPU_COUNT 1 IMAGES_PER_GPU 1 IMAGE_MAX_DIM 256 IMAGE_META_SIZE 93 IMAGE_SHAPE [256 256 3] LOSS_WEIGHTS {'rpn_class_loss': 1.0, 'rpn_bbox_loss': 1.0, 'mrcnn_class_loss': 1.0, 'mrcnn_bbox_loss': 1.0, 'mrcnn_mask_loss': 1.0} MASK_POOL_SIZE 14 MASK_SHAPE [28, 28] MAX_GT_INSTANCES 100 MINI_MASK_SHAPE (56, 56) NUM_CLASSES 81 POOL_SIZE 7 POST_NMS_ROIS_INFERENCE 1000 POST_NMS_ROIS_TRAINING 2000 PRE_NMS_LIMIT 6000 ROI_POSITIVE_RATIO 0.33 RPN_ANCHOR_RATIOS [0.5, 1, 2] RPN_ANCHOR_SCALES [16, 32, 64, 128, 256] RPN_ANCHOR_STRIDE 1 RPN_BBOX_STD_DEV [0.1 0.1 0.2 0.2] RPN_NMS_THRESHOLD 0.7 RPN_TRAIN_ANCHORS_PER_IMAGE 256 TOP_DOWN_PYRAMID_SIZE 256 TRAIN_BN False TRAIN_ROIS_PER_IMAGE 200 USE_MINI_MASK True WEIGHT_DECAY 0

2023-03-14 21:31:52.307184: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-03-14 21:31:52.621770: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 3491 MB memory: -> device: 0, name: NVIDIA GeForce RTX 3060 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6 WARNING:tensorflow:From C:\Users\10230\AppData\Local\Programs\Python\Python38\lib\site-packages\tensorflow\python\util\deprecation.py:629: calling map_fn_v2 (from tensorflow.python.ops.map_fn) with dtype is deprecated and will be removed in a future version. Instructions for updating: Use fn_output_signature instead Traceback (most recent call last): File "D:\Study\mask-rcnn-tf2-master\predict.py", line 18, in mask_rcnn = MASK_RCNN() File "D:\Study\mask-rcnn-tf2-master\mask_rcnn.py", line 107, in init self.generate() File "D:\Study\mask-rcnn-tf2-master\mask_rcnn.py", line 120, in generate self.model.load_weights(self.model_path, by_name=True) File "C:\Users\10230\AppData\Local\Programs\Python\Python38\lib\site-packages\keras\utils\traceback_utils.py", line 70, in error_handler raise e.with_traceback(filtered_tb) from None File "C:\Users\10230\AppData\Local\Programs\Python\Python38\lib\site-packages\keras\saving\hdf5_format.py", line 944, in load_weights_from_hdf5_group_by_name raise ValueError( ValueError: Shape mismatch in layer #392 (named mrcnn_bbox_fc) for weight mrcnn_bbox_fc/kernel:0. Weight expects shape (1024, 324). Received saved weight with shape (1024, 12)

Process finished with exit code 1

我也遇到了类似的错误。训练时ok,在predict加载权重时出错。

Sokkafan commented 1 year ago

我的解决办法是把coco数据集的权重文件删掉就好了。coco的权重文件里shape是(1024, 324),而(1024,8)或者(1024,12)才是其他数据集的权重里的shape。

bubbliiiing commented 1 year ago

Configurations: BACKBONE_STRIDES [4, 8, 16, 32, 64] BATCH_SIZE 1 BBOX_STD_DEV [0.1 0.1 0.2 0.2] DETECTION_MAX_INSTANCES 100 DETECTION_MIN_CONFIDENCE 0.5 DETECTION_NMS_THRESHOLD 0.3 FPN_CLASSIF_FC_LAYERS_SIZE 1024 GPU_COUNT 1 IMAGES_PER_GPU 1 IMAGE_MAX_DIM 256 IMAGE_META_SIZE 93 IMAGE_SHAPE [256 256 3] LOSS_WEIGHTS {'rpn_class_loss': 1.0, 'rpn_bbox_loss': 1.0, 'mrcnn_class_loss': 1.0, 'mrcnn_bbox_loss': 1.0, 'mrcnn_mask_loss': 1.0} MASK_POOL_SIZE 14 MASK_SHAPE [28, 28] MAX_GT_INSTANCES 100 MINI_MASK_SHAPE (56, 56) NUM_CLASSES 81 POOL_SIZE 7 POST_NMS_ROIS_INFERENCE 1000 POST_NMS_ROIS_TRAINING 2000 PRE_NMS_LIMIT 6000 ROI_POSITIVE_RATIO 0.33 RPN_ANCHOR_RATIOS [0.5, 1, 2] RPN_ANCHOR_SCALES [16, 32, 64, 128, 256] RPN_ANCHOR_STRIDE 1 RPN_BBOX_STD_DEV [0.1 0.1 0.2 0.2] RPN_NMS_THRESHOLD 0.7 RPN_TRAIN_ANCHORS_PER_IMAGE 256 TOP_DOWN_PYRAMID_SIZE 256 TRAIN_BN False TRAIN_ROIS_PER_IMAGE 200 USE_MINI_MASK True WEIGHT_DECAY 0

2023-03-14 21:31:52.307184: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-03-14 21:31:52.621770: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 3491 MB memory: -> device: 0, name: NVIDIA GeForce RTX 3060 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6 WARNING:tensorflow:From C:\Users\10230\AppData\Local\Programs\Python\Python38\lib\site-packages\tensorflow\python\util\deprecation.py:629: calling map_fn_v2 (from tensorflow.python.ops.map_fn) with dtype is deprecated and will be removed in a future version. Instructions for updating: Use fn_output_signature instead Traceback (most recent call last): File "D:\Study\mask-rcnn-tf2-master\predict.py", line 18, in mask_rcnn = MASK_RCNN() File "D:\Study\mask-rcnn-tf2-master\mask_rcnn.py", line 107, in init self.generate() File "D:\Study\mask-rcnn-tf2-master\mask_rcnn.py", line 120, in generate self.model.load_weights(self.model_path, by_name=True) File "C:\Users\10230\AppData\Local\Programs\Python\Python38\lib\site-packages\keras\utils\traceback_utils.py", line 70, in error_handler raise e.with_traceback(filtered_tb) from None File "C:\Users\10230\AppData\Local\Programs\Python\Python38\lib\site-packages\keras\saving\hdf5_format.py", line 944, in load_weights_from_hdf5_group_by_name raise ValueError( ValueError: Shape mismatch in layer #392 (named mrcnn_bbox_fc) for weight mrcnn_bbox_fc/kernel:0. Weight expects shape (1024, 324). Received saved weight with shape (1024, 12)

Process finished with exit code 1

我也遇到了类似的错误。训练时ok,在predict加载权重时出错。

应该是预测时候的种类没改 欸

bubbliiiing commented 1 year ago

或者model_path没改

yasaorder commented 1 year ago

谢谢