solov2训练时一直卡住

hehu56 commented 3 years ago

(base) PS C:\Users\何虎> conda activate paddle
(paddle) PS C:\Users\何虎> cd  D:\Anconda3\envs\paddle_decet\PaddleDetection
(paddle) PS D:\Anconda3\envs\paddle_decet\PaddleDetection> python tools\train.py -c D:\Anconda3\envs\paddle_decet\PaddleDetection\configs\solov2\solov2_r50_fpn_1x.yml -o use_gpu=false
D:\Anconda3\envs\paddle_decet\PaddleDetection\ppdet\core\workspace.py:118: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3, and in 3.9 it will stop working
  isinstance(merge_dct[k], collections.Mapping)):
D:\Anconda3\envs\paddle\lib\site-packages\paddle\fluid\framework.py:2383: DeprecationWarning: an integer is required (got type paddle.fluid.core_avx.op_proto_and_checker_maker.OpRole).  Implicit conversion to integers using __int__ is deprecated, and may be removed in a future version of Python.
  self.desc._set_attr(name, val)
OrderedDict([('image', var image : fluid.VarType.LOD_TENSOR.shape(-1, 3, -1, -1).astype(VarType.FP32)), ('im_id', var im_id : fluid.VarType.LOD_TENSOR.shape(-1, 1).astype(VarType.INT64)), ('fg_num', var fg_num : fluid.VarType.LOD_TENSOR.shape(-1,).astype(VarType.INT32)), ('ins_label0', var ins_label0 : fluid.VarType.LOD_TENSOR.shape(-1, -1, -1).astype(VarType.INT32)), ('cate_label0', var cate_label0 : fluid.VarType.LOD_TENSOR.shape(-1,).astype(VarType.INT32)), ('grid_order0', var grid_order0 : fluid.VarType.LOD_TENSOR.shape(-1,).astype(VarType.INT32)), ('ins_label1', var ins_label1 : fluid.VarType.LOD_TENSOR.shape(-1, -1, -1).astype(VarType.INT32)), ('cate_label1', var cate_label1 : fluid.VarType.LOD_TENSOR.shape(-1,).astype(VarType.INT32)), ('grid_order1', var grid_order1 : fluid.VarType.LOD_TENSOR.shape(-1,).astype(VarType.INT32)), ('ins_label2', var ins_label2 : fluid.VarType.LOD_TENSOR.shape(-1, -1, -1).astype(VarType.INT32)), ('cate_label2', var cate_label2 : fluid.VarType.LOD_TENSOR.shape(-1,).astype(VarType.INT32)), ('grid_order2', var grid_order2 : fluid.VarType.LOD_TENSOR.shape(-1,).astype(VarType.INT32)), ('ins_label3', var ins_label3 : fluid.VarType.LOD_TENSOR.shape(-1, -1, -1).astype(VarType.INT32)), ('cate_label3', var cate_label3 : fluid.VarType.LOD_TENSOR.shape(-1,).astype(VarType.INT32)), ('grid_order3', var grid_order3 : fluid.VarType.LOD_TENSOR.shape(-1,).astype(VarType.INT32)), ('ins_label4', var ins_label4 : fluid.VarType.LOD_TENSOR.shape(-1, -1, -1).astype(VarType.INT32)), ('cate_label4', var cate_label4 : fluid.VarType.LOD_TENSOR.shape(-1,).astype(VarType.INT32)), ('grid_order4', var grid_order4 : fluid.VarType.LOD_TENSOR.shape(-1,).astype(VarType.INT32))])
D:\Anconda3\envs\paddle\lib\site-packages\paddle\fluid\framework.py:2383: DeprecationWarning: an integer is required (got type paddle.fluid.core_avx.VarType).  Implicit conversion to integers using __int__ is deprecated, and may be removed in a future version of Python.
  self.desc._set_attr(name, val)
D:\Anconda3\envs\paddle\lib\site-packages\paddle\fluid\layers\math_op_patch.py:273: UserWarning: D:\Anconda3\envs\paddle_decet\PaddleDetection\ppdet\modeling\backbones\fpn.py:108
The behavior of expression A + B has been unified with elementwise_add(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_add(X, Y, axis=0) instead of A + B. This transitional warning will be dropped in the future.
  warnings.warn(
D:\Anconda3\envs\paddle\lib\site-packages\paddle\fluid\layers\math_op_patch.py:273: UserWarning: D:\Anconda3\envs\paddle_decet\PaddleDetection\ppdet\modeling\losses\solov2_loss.py:51
The behavior of expression A * B has been unified with elementwise_mul(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_mul(X, Y, axis=0) instead of A * B. This transitional warning will be dropped in the future.
  warnings.warn(
D:\Anconda3\envs\paddle\lib\site-packages\paddle\fluid\layers\math_op_patch.py:273: UserWarning: D:\Anconda3\envs\paddle_decet\PaddleDetection\ppdet\modeling\losses\solov2_loss.py:52
The behavior of expression A * B has been unified with elementwise_mul(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_mul(X, Y, axis=0) instead of A * B. This transitional warning will be dropped in the future.
  warnings.warn(
D:\Anconda3\envs\paddle\lib\site-packages\paddle\fluid\layers\math_op_patch.py:273: UserWarning: D:\Anconda3\envs\paddle_decet\PaddleDetection\ppdet\modeling\losses\solov2_loss.py:53
The behavior of expression A * B has been unified with elementwise_mul(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_mul(X, Y, axis=0) instead of A * B. This transitional warning will be dropped in the future.
  warnings.warn(
D:\Anconda3\envs\paddle\lib\site-packages\paddle\fluid\layers\math_op_patch.py:273: UserWarning: D:\Anconda3\envs\paddle_decet\PaddleDetection\ppdet\modeling\losses\solov2_loss.py:54
The behavior of expression A + B has been unified with elementwise_add(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_add(X, Y, axis=0) instead of A + B. This transitional warning will be dropped in the future.
  warnings.warn(
D:\Anconda3\envs\paddle\lib\site-packages\paddle\fluid\layers\math_op_patch.py:273: UserWarning: D:\Anconda3\envs\paddle_decet\PaddleDetection\ppdet\modeling\losses\solov2_loss.py:54
The behavior of expression A / B has been unified with elementwise_div(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_div(X, Y, axis=0) instead of A / B. This transitional warning will be dropped in the future.
  warnings.warn(
D:\Anconda3\envs\paddle\lib\site-packages\paddle\fluid\backward.py:300: DeprecationWarning: an integer is required (got type paddle.fluid.core_avx.VarType).  Implicit conversion to integers using __int__ is deprecated, and may be removed in a future version of Python.
  op_desc._set_attr(name, val)
D:\Anconda3\envs\paddle\lib\site-packages\paddle\fluid\backward.py:300: DeprecationWarning: an integer is required (got type paddle.fluid.core_avx.op_proto_and_checker_maker.OpRole).  Implicit conversion to integers using __int__ is deprecated, and may be removed in a future version of Python.
  op_desc._set_attr(name, val)
D:\Anconda3\envs\paddle\lib\site-packages\paddle\fluid\backward.py:1070: DeprecationWarning: an integer is required (got type paddle.fluid.core_avx.op_proto_and_checker_maker.OpRole).  Implicit conversion to integers using __int__ is deprecated, and may be removed in a future version of Python.
  new_op_desc._set_attr(op_role_attr_name, backward)
2020-12-21 11:39:23,956-INFO: If regularizer of a Parameter has been set by 'fluid.ParamAttr' or 'fluid.WeightNormParamAttr' already. The Regularization[L2Decay, regularization_coeff=0.000100] in Optimizer will not take effect, and it will only be applied to other Parameters!
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
!!! The CPU_NUM is not specified, you should set CPU_NUM in the environment variable list.
CPU_NUM indicates that how many CPUPlace are used in the current task.
And if this parameter are set as N (equal to the number of physical CPU core) the program may be faster.

export CPU_NUM=12 # for example, set CPU_NUM as number of physical CPU core which is 12.

!!! The default number of CPU_NUM=1.
W1221 11:39:26.767999  1312 build_strategy.cc:170] fusion_group is not enabled for Windows/MacOS now, and only effective when running with CUDA GPU.
D:\Anconda3\envs\paddle_decet\PaddleDetection\ppdet\data\reader.py:89: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3, and in 3.9 it will stop working
  if isinstance(item, collections.Sequence) and len(item) == 0:
2020-12-21 11:39:49,684-INFO: iter: 0, lr: 0.000000, 'loss_ins': '2.309299', 'loss_cate': '0.852402', 'loss': '3.161701', eta: 0:00:00, batch_cost: 0.00000 sec, ips: 20000.00000 images/sec
2020-12-21 11:46:48,224-INFO: iter: 20, lr: 0.000200, 'loss_ins': '2.290069', 'loss_cate': '0.513891', 'loss': '2.873197', eta: 21 days, 21:19:55, batch_cost: 21.01796 sec, ips: 0.09516 images/sec
2020-12-21 11:48:02,812-INFO: KeyboardInterrupt: main proc 8420 exit, kill subprocess []

(paddle) PS D:\Anconda3\envs\paddle_decet\PaddleDetection> python tools\train.py -c D:\Anconda3\envs\paddle_decet\PaddleDetection\configs\solov2\solov2_r50_fpn_1x.yml -o use_cpu=true 12
Traceback (most recent call last):
  File "tools\train.py", line 378, in <module>
    FLAGS = parser.parse_args()
  File "D:\Anconda3\envs\paddle_decet\PaddleDetection\ppdet\utils\cli.py", line 58, in parse_args
    args.opt = self._parse_opt(args.opt)
  File "D:\Anconda3\envs\paddle_decet\PaddleDetection\ppdet\utils\cli.py", line 67, in _parse_opt
    k, v = s.split('=', 1)
ValueError: not enough values to unpack (expected 2, got 1)
(paddle) PS D:\Anconda3\envs\paddle_decet\PaddleDetection> python tools\train.py -c D:\Anconda3\envs\paddle_decet\PaddleDetection\configs\solov2\solov2_r50_fpn_1x.yml -o use_cpu=true
D:\Anconda3\envs\paddle_decet\PaddleDetection\ppdet\core\workspace.py:118: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3, and in 3.9 it will stop working
  isinstance(merge_dct[k], collections.Mapping)):
D:\Anconda3\envs\paddle\lib\site-packages\paddle\fluid\framework.py:2383: DeprecationWarning: an integer is required (got type paddle.fluid.core_avx.op_proto_and_checker_maker.OpRole).  Implicit conversion to integers using __int__ is deprecated, and may be removed in a future version of Python.
  self.desc._set_attr(name, val)
OrderedDict([('image', var image : fluid.VarType.LOD_TENSOR.shape(-1, 3, -1, -1).astype(VarType.FP32)), ('im_id', var im_id : fluid.VarType.LOD_TENSOR.shape(-1, 1).astype(VarType.INT64)), ('fg_num', var fg_num : fluid.VarType.LOD_TENSOR.shape(-1,).astype(VarType.INT32)), ('ins_label0', var ins_label0 : fluid.VarType.LOD_TENSOR.shape(-1, -1, -1).astype(VarType.INT32)), ('cate_label0', var cate_label0 : fluid.VarType.LOD_TENSOR.shape(-1,).astype(VarType.INT32)), ('grid_order0', var grid_order0 : fluid.VarType.LOD_TENSOR.shape(-1,).astype(VarType.INT32)), ('ins_label1', var ins_label1 : fluid.VarType.LOD_TENSOR.shape(-1, -1, -1).astype(VarType.INT32)), ('cate_label1', var cate_label1 : fluid.VarType.LOD_TENSOR.shape(-1,).astype(VarType.INT32)), ('grid_order1', var grid_order1 : fluid.VarType.LOD_TENSOR.shape(-1,).astype(VarType.INT32)), ('ins_label2', var ins_label2 : fluid.VarType.LOD_TENSOR.shape(-1, -1, -1).astype(VarType.INT32)), ('cate_label2', var cate_label2 : fluid.VarType.LOD_TENSOR.shape(-1,).astype(VarType.INT32)), ('grid_order2', var grid_order2 : fluid.VarType.LOD_TENSOR.shape(-1,).astype(VarType.INT32)), ('ins_label3', var ins_label3 : fluid.VarType.LOD_TENSOR.shape(-1, -1, -1).astype(VarType.INT32)), ('cate_label3', var cate_label3 : fluid.VarType.LOD_TENSOR.shape(-1,).astype(VarType.INT32)), ('grid_order3', var grid_order3 : fluid.VarType.LOD_TENSOR.shape(-1,).astype(VarType.INT32)), ('ins_label4', var ins_label4 : fluid.VarType.LOD_TENSOR.shape(-1, -1, -1).astype(VarType.INT32)), ('cate_label4', var cate_label4 : fluid.VarType.LOD_TENSOR.shape(-1,).astype(VarType.INT32)), ('grid_order4', var grid_order4 : fluid.VarType.LOD_TENSOR.shape(-1,).astype(VarType.INT32))])
D:\Anconda3\envs\paddle\lib\site-packages\paddle\fluid\framework.py:2383: DeprecationWarning: an integer is required (got type paddle.fluid.core_avx.VarType).  Implicit conversion to integers using __int__ is deprecated, and may be removed in a future version of Python.
  self.desc._set_attr(name, val)
D:\Anconda3\envs\paddle\lib\site-packages\paddle\fluid\layers\math_op_patch.py:273: UserWarning: D:\Anconda3\envs\paddle_decet\PaddleDetection\ppdet\modeling\backbones\fpn.py:108
The behavior of expression A + B has been unified with elementwise_add(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_add(X, Y, axis=0) instead of A + B. This transitional warning will be dropped in the future.
  warnings.warn(
D:\Anconda3\envs\paddle\lib\site-packages\paddle\fluid\layers\math_op_patch.py:273: UserWarning: D:\Anconda3\envs\paddle_decet\PaddleDetection\ppdet\modeling\losses\solov2_loss.py:51
The behavior of expression A * B has been unified with elementwise_mul(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_mul(X, Y, axis=0) instead of A * B. This transitional warning will be dropped in the future.
  warnings.warn(
D:\Anconda3\envs\paddle\lib\site-packages\paddle\fluid\layers\math_op_patch.py:273: UserWarning: D:\Anconda3\envs\paddle_decet\PaddleDetection\ppdet\modeling\losses\solov2_loss.py:52
The behavior of expression A * B has been unified with elementwise_mul(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_mul(X, Y, axis=0) instead of A * B. This transitional warning will be dropped in the future.
  warnings.warn(
D:\Anconda3\envs\paddle\lib\site-packages\paddle\fluid\layers\math_op_patch.py:273: UserWarning: D:\Anconda3\envs\paddle_decet\PaddleDetection\ppdet\modeling\losses\solov2_loss.py:53
The behavior of expression A * B has been unified with elementwise_mul(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_mul(X, Y, axis=0) instead of A * B. This transitional warning will be dropped in the future.
  warnings.warn(
D:\Anconda3\envs\paddle\lib\site-packages\paddle\fluid\layers\math_op_patch.py:273: UserWarning: D:\Anconda3\envs\paddle_decet\PaddleDetection\ppdet\modeling\losses\solov2_loss.py:54
The behavior of expression A + B has been unified with elementwise_add(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_add(X, Y, axis=0) instead of A + B. This transitional warning will be dropped in the future.
  warnings.warn(
D:\Anconda3\envs\paddle\lib\site-packages\paddle\fluid\layers\math_op_patch.py:273: UserWarning: D:\Anconda3\envs\paddle_decet\PaddleDetection\ppdet\modeling\losses\solov2_loss.py:54
The behavior of expression A / B has been unified with elementwise_div(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_div(X, Y, axis=0) instead of A / B. This transitional warning will be dropped in the future.
  warnings.warn(
D:\Anconda3\envs\paddle\lib\site-packages\paddle\fluid\backward.py:300: DeprecationWarning: an integer is required (got type paddle.fluid.core_avx.VarType).  Implicit conversion to integers using __int__ is deprecated, and may be removed in a future version of Python.
  op_desc._set_attr(name, val)
D:\Anconda3\envs\paddle\lib\site-packages\paddle\fluid\backward.py:300: DeprecationWarning: an integer is required (got type paddle.fluid.core_avx.op_proto_and_checker_maker.OpRole).  Implicit conversion to integers using __int__ is deprecated, and may be removed in a future version of Python.
  op_desc._set_attr(name, val)
D:\Anconda3\envs\paddle\lib\site-packages\paddle\fluid\backward.py:1070: DeprecationWarning: an integer is required (got type paddle.fluid.core_avx.op_proto_and_checker_maker.OpRole).  Implicit conversion to integers using __int__ is deprecated, and may be removed in a future version of Python.
  new_op_desc._set_attr(op_role_attr_name, backward)
2020-12-21 11:48:37,114-INFO: If regularizer of a Parameter has been set by 'fluid.ParamAttr' or 'fluid.WeightNormParamAttr' already. The Regularization[L2Decay, regularization_coeff=0.000100] in Optimizer will not take effect, and it will only be applied to other Parameters!
W1221 11:48:37.301550  4516 device_context.cc:338] Please NOTE: device: 0, CUDA Capability: 61, Driver API Version: 10.2, Runtime API Version: 10.2
W1221 11:48:37.317164  4516 device_context.cc:346] device: 0, cuDNN Version: 8.0.
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
W1221 11:48:38.860647  4516 build_strategy.cc:170] fusion_group is not enabled for Windows/MacOS now, and only effective when running with CUDA GPU.
D:\Anconda3\envs\paddle_decet\PaddleDetection\ppdet\data\reader.py:89: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3, and in 3.9 it will stop working
  if isinstance(item, collections.Sequence) and len(item) == 0:
2020-12-21 11:48:40,360-INFO: iter: 0, lr: 0.000000, 'loss_ins': '2.592853', 'loss_cate': '1.018936', 'loss': '3.611789', eta: 0:00:00, batch_cost: 0.00000 sec, ips: 20000.00000 images/sec
2020-12-21 11:48:47,118-INFO: KeyboardInterrupt: main proc 2140 exit, kill subprocess []

(paddle) PS D:\Anconda3\envs\paddle_decet\PaddleDetection> python tools\train.py -c D:\Anconda3\envs\paddle_decet\PaddleDetection\configs\solov2\solov2_r50_fpn_1x.yml -o use_gpu=true
D:\Anconda3\envs\paddle_decet\PaddleDetection\ppdet\core\workspace.py:118: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3, and in 3.9 it will stop working
  isinstance(merge_dct[k], collections.Mapping)):
D:\Anconda3\envs\paddle\lib\site-packages\paddle\fluid\framework.py:2383: DeprecationWarning: an integer is required (got type paddle.fluid.core_avx.op_proto_and_checker_maker.OpRole).  Implicit conversion to integers using __int__ is deprecated, and may be removed in a future version of Python.
  self.desc._set_attr(name, val)
OrderedDict([('image', var image : fluid.VarType.LOD_TENSOR.shape(-1, 3, -1, -1).astype(VarType.FP32)), ('im_id', var im_id : fluid.VarType.LOD_TENSOR.shape(-1, 1).astype(VarType.INT64)), ('fg_num', var fg_num : fluid.VarType.LOD_TENSOR.shape(-1,).astype(VarType.INT32)), ('ins_label0', var ins_label0 : fluid.VarType.LOD_TENSOR.shape(-1, -1, -1).astype(VarType.INT32)), ('cate_label0', var cate_label0 : fluid.VarType.LOD_TENSOR.shape(-1,).astype(VarType.INT32)), ('grid_order0', var grid_order0 : fluid.VarType.LOD_TENSOR.shape(-1,).astype(VarType.INT32)), ('ins_label1', var ins_label1 : fluid.VarType.LOD_TENSOR.shape(-1, -1, -1).astype(VarType.INT32)), ('cate_label1', var cate_label1 : fluid.VarType.LOD_TENSOR.shape(-1,).astype(VarType.INT32)), ('grid_order1', var grid_order1 : fluid.VarType.LOD_TENSOR.shape(-1,).astype(VarType.INT32)), ('ins_label2', var ins_label2 : fluid.VarType.LOD_TENSOR.shape(-1, -1, -1).astype(VarType.INT32)), ('cate_label2', var cate_label2 : fluid.VarType.LOD_TENSOR.shape(-1,).astype(VarType.INT32)), ('grid_order2', var grid_order2 : fluid.VarType.LOD_TENSOR.shape(-1,).astype(VarType.INT32)), ('ins_label3', var ins_label3 : fluid.VarType.LOD_TENSOR.shape(-1, -1, -1).astype(VarType.INT32)), ('cate_label3', var cate_label3 : fluid.VarType.LOD_TENSOR.shape(-1,).astype(VarType.INT32)), ('grid_order3', var grid_order3 : fluid.VarType.LOD_TENSOR.shape(-1,).astype(VarType.INT32)), ('ins_label4', var ins_label4 : fluid.VarType.LOD_TENSOR.shape(-1, -1, -1).astype(VarType.INT32)), ('cate_label4', var cate_label4 : fluid.VarType.LOD_TENSOR.shape(-1,).astype(VarType.INT32)), ('grid_order4', var grid_order4 : fluid.VarType.LOD_TENSOR.shape(-1,).astype(VarType.INT32))])
D:\Anconda3\envs\paddle\lib\site-packages\paddle\fluid\framework.py:2383: DeprecationWarning: an integer is required (got type paddle.fluid.core_avx.VarType).  Implicit conversion to integers using __int__ is deprecated, and may be removed in a future version of Python.
  self.desc._set_attr(name, val)
D:\Anconda3\envs\paddle\lib\site-packages\paddle\fluid\layers\math_op_patch.py:273: UserWarning: D:\Anconda3\envs\paddle_decet\PaddleDetection\ppdet\modeling\backbones\fpn.py:108
The behavior of expression A + B has been unified with elementwise_add(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_add(X, Y, axis=0) instead of A + B. This transitional warning will be dropped in the future.
  warnings.warn(
D:\Anconda3\envs\paddle\lib\site-packages\paddle\fluid\layers\math_op_patch.py:273: UserWarning: D:\Anconda3\envs\paddle_decet\PaddleDetection\ppdet\modeling\losses\solov2_loss.py:51
The behavior of expression A * B has been unified with elementwise_mul(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_mul(X, Y, axis=0) instead of A * B. This transitional warning will be dropped in the future.
  warnings.warn(
D:\Anconda3\envs\paddle\lib\site-packages\paddle\fluid\layers\math_op_patch.py:273: UserWarning: D:\Anconda3\envs\paddle_decet\PaddleDetection\ppdet\modeling\losses\solov2_loss.py:52
The behavior of expression A * B has been unified with elementwise_mul(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_mul(X, Y, axis=0) instead of A * B. This transitional warning will be dropped in the future.
  warnings.warn(
D:\Anconda3\envs\paddle\lib\site-packages\paddle\fluid\layers\math_op_patch.py:273: UserWarning: D:\Anconda3\envs\paddle_decet\PaddleDetection\ppdet\modeling\losses\solov2_loss.py:53
The behavior of expression A * B has been unified with elementwise_mul(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_mul(X, Y, axis=0) instead of A * B. This transitional warning will be dropped in the future.
  warnings.warn(
D:\Anconda3\envs\paddle\lib\site-packages\paddle\fluid\layers\math_op_patch.py:273: UserWarning: D:\Anconda3\envs\paddle_decet\PaddleDetection\ppdet\modeling\losses\solov2_loss.py:54
The behavior of expression A + B has been unified with elementwise_add(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_add(X, Y, axis=0) instead of A + B. This transitional warning will be dropped in the future.
  warnings.warn(
D:\Anconda3\envs\paddle\lib\site-packages\paddle\fluid\layers\math_op_patch.py:273: UserWarning: D:\Anconda3\envs\paddle_decet\PaddleDetection\ppdet\modeling\losses\solov2_loss.py:54
The behavior of expression A / B has been unified with elementwise_div(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_div(X, Y, axis=0) instead of A / B. This transitional warning will be dropped in the future.
  warnings.warn(
D:\Anconda3\envs\paddle\lib\site-packages\paddle\fluid\backward.py:300: DeprecationWarning: an integer is required (got type paddle.fluid.core_avx.VarType).  Implicit conversion to integers using __int__ is deprecated, and may be removed in a future version of Python.
  op_desc._set_attr(name, val)
D:\Anconda3\envs\paddle\lib\site-packages\paddle\fluid\backward.py:300: DeprecationWarning: an integer is required (got type paddle.fluid.core_avx.op_proto_and_checker_maker.OpRole).  Implicit conversion to integers using __int__ is deprecated, and may be removed in a future version of Python.
  op_desc._set_attr(name, val)
D:\Anconda3\envs\paddle\lib\site-packages\paddle\fluid\backward.py:1070: DeprecationWarning: an integer is required (got type paddle.fluid.core_avx.op_proto_and_checker_maker.OpRole).  Implicit conversion to integers using __int__ is deprecated, and may be removed in a future version of Python.
  new_op_desc._set_attr(op_role_attr_name, backward)
2020-12-21 11:50:08,468-INFO: If regularizer of a Parameter has been set by 'fluid.ParamAttr' or 'fluid.WeightNormParamAttr' already. The Regularization[L2Decay, regularization_coeff=0.000100] in Optimizer will not take effect, and it will only be applied to other Parameters!
W1221 11:50:08.656023  2480 device_context.cc:338] Please NOTE: device: 0, CUDA Capability: 61, Driver API Version: 10.2, Runtime API Version: 10.2
W1221 11:50:08.671609  2480 device_context.cc:346] device: 0, cuDNN Version: 8.0.
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
W1221 11:50:10.218189  2480 build_strategy.cc:170] fusion_group is not enabled for Windows/MacOS now, and only effective when running with CUDA GPU.
D:\Anconda3\envs\paddle_decet\PaddleDetection\ppdet\data\reader.py:89: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3, and in 3.9 it will stop working
  if isinstance(item, collections.Sequence) and len(item) == 0:
2020-12-21 11:50:11,702-INFO: iter: 0, lr: 0.000000, 'loss_ins': '2.244600', 'loss_cate': '1.251393', 'loss': '3.495992', eta: 0:00:00, batch_cost: 0.00000 sec, ips: 20000.00000 images/sec
2020-12-21 11:50:21,439-INFO: iter: 20, lr: 0.000200, 'loss_ins': '2.312218', 'loss_cate': '0.610663', 'loss': '2.957036', eta: 13:28:34, batch_cost: 0.53917 sec, ips: 3.70938 images/sec

训练自己的数据集到这里一直不动，有的时候会到60个iter。但大部分尝试都是如上所述。

qingqing01 commented 3 years ago

@hehu56 请问您总共多少图片？几张卡训练？

hehu56 commented 3 years ago

@qingqing01 单卡训练，想先跑通，图片数量很少只有30张。更换过模型r_101和r_50，也是一样。训练的配置如下

TrainReader:
  batch_size: 4
  worker_num: 2
  inputs_def:
    fields: ['image', 'im_id', 'gt_segm']
  dataset:
    !COCODataSet
    dataset_dir: E:\work\anniu\out_labelme
    anno_path: annotations/instance_train.json
    image_dir: train
  sample_transforms:
  - !DecodeImage
    to_rgb: true
  - !Poly2Mask {}
  - !ResizeImage
    target_size: 200
    max_size: 210
    interp: 1
    use_cv2: true
    resize_box: true
  - !RandomFlipImage
    prob: 0.5
  - !NormalizeImage
    is_channel_first: false
    is_scale: true
    mean: [0.485,0.456,0.406]
    std: [0.229, 0.224,0.225]
  - !Permute
    to_bgr: false
    channel_first: true
  batch_transforms:
  - !PadBatch
    pad_to_stride: 32
  - !Gt2Solov2Target
    num_grids: [40, 36, 24, 16, 12]
    scale_ranges: [[1, 96], [48, 192], [96, 384], [192, 768], [384, 2048]]
    coord_sigma: 0.2
  shuffle: True

yghstill commented 3 years ago

@hehu56 你是使用cpu进行训练吗？将TrainReader 中的worker_num设为-1再试下呢

hehu56 commented 3 years ago

@yghstill cpu和gpu都试过

hehu56 commented 3 years ago

@yghstill 尝试了-1还是不行

yghstill commented 3 years ago

@hehu56 查看一下进程信息和内存占用情况，将batch_size调成1也再试一下

hehu56 commented 3 years ago

还是不行，cpu30%,内存48%，gpu占用很少 @yghstill

qingqing01 commented 3 years ago

@hehu56 如果worker_num -1, 单卡训练，卡住，感觉是否可能数据里有问题？能否不设置shuffle，每次都固定顺序，batch_size为1，log_iter设置为1，看下是否固定卡到同一个图片上。

hehu56 commented 3 years ago

数据是通过labelme标注的，然后通过x2coco.py转成coco数据的。暂时还没能解决

hehu56 commented 3 years ago

我将max_iter从200改为10以后能够跑通了，但不知道是为什么 @yghstill

yghstill commented 3 years ago

@hehu56 你改的是max_iters吗？改成10的话，那训10个iter就会停止训练。

hehu56 commented 3 years ago

是的，训练10个就停止。就不会出现训练卡住的情况，max_iter太大的话就跑着就不跑了。 @yghstill

curryJ commented 3 years ago

你好，请问能展示几行你的训练使用的json吗，我也是在用solov2，但是在读取数据方面报错。当我使用coco数据集不会报错，但是当我使用自己的数据集，然后将标注好的xml文件转换成coco的json的时候，就会报错。这是我的json文件非常感谢！

hehu56 commented 3 years ago

@curryJ { "height": 100, "width": 220, "id": 10, "file_name": "9.jpg" } ], "categories": [ { "supercategory": "component", "id": 1, "name": "xi" } ], "annotations": [ { "segmentation": [ [ 15.481171548117155, 15.21757322175732, 43.375174337517436, 15.775453277545331, 42.53835425383543, 39.76429567642957, 61.366806136680616, 40.46164574616458, 63.04044630404463, 43.390516039051604, 63.17991631799163, 59.15062761506276, 44.63040446304045, 59.290097629009765, 42.25941422594142, 79.37377963737796, 13.668061366806137, 78.67642956764296 ] ], "iscrowd": 0, "image_id": 1, "bbox": [ 13.0, 15.0, 50.0, 64.0 ], "area": 3200.0, "category_id": 1, "id": 1 },

hehu56 commented 3 years ago

@curryJ 使用labelme标注你的数据，改成要求的存放文件名和文件结构，然后使用tools/x2coco.py跑一下就好了

curryJ commented 3 years ago

谢谢了。我就是用的tools/x2coco.py，没有报错，生成的json就是我上面贴出来那个样子，就是少了一个Segmentation标签

------------------ 原始邮件 ------------------ 发件人: "notifications"<notifications@github.com>; 发送时间: 2020年12月30日(星期三) 晚上6:29 收件人: "PaddlePaddle/PaddleDetection"<PaddleDetection@noreply.github.com>; 抄送: "PGyoung13"<761478292@qq.com>; "Mention"<mention@noreply.github.com>; 主题: Re: [PaddlePaddle/PaddleDetection] solov2训练时一直卡住 (#1942)

@curryJ 使用labelme标注你的数据，改成要求的存放文件名和文件结构，然后使用tools/x2coco.py跑一下就好了

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

curryJ commented 3 years ago

@curryJ 使用labelme标注你的数据，改成要求的存放文件名和文件结构，然后使用tools/x2coco.py跑一下就好了

我懂了是我的xml里面没有Segmentation信息，只有bounding box，谢谢了

NewBeeLee commented 3 years ago

我也遇到过同样的问题，开始环境是1660ti，cuda10.2，batch_size=2会出现跑一半不动的情况，显卡不干活，把batch_size调成1后能跑通，但是预测结果很诡异，直接对整张图片进行了分类。但是同样的COCO数据集用mask_rcnn跑没有任何问题，预测结果也是对的。后来换成3080，cuda11.0，batch_size=2时依然会跑一半不动，显卡不工作。

hehu56 commented 3 years ago

@NewBeeLee 那你后面有解决吗，还是换了别的方法呢

NewBeeLee commented 3 years ago

@NewBeeLee 那你后面有解决吗，还是换了别的方法呢你是不是resnet101的卡住了，我之后用resnet50跑通了，数据集也没动，这两天正在验证效果。

hehu56 commented 3 years ago

哦哦好的谢谢，我尝试一下 @NewBeeLee

heavengate commented 3 years ago

鉴于问题已经解决，当前issue关闭，如果还有问题可以reopen这个issue~

PaddlePaddle / PaddleDetection

solov2训练时一直卡住 #1942