DetectionTeamUCAS / RetinaNet_Tensorflow_Rotation

Focal Loss for Dense Rotation Object Detection
MIT License
312 stars 93 forks source link

训练速度过慢 #83

Open bonnie-cbw opened 3 years ago

bonnie-cbw commented 3 years ago

求助求助! 您好,我使用的裁剪为800×800的DOTA数据集,ResNet50,按照您的教程一步步做的,环境的配置也没有什么问题,但是训练速度大概在2.8秒一张图。这个速度是不是有一些太慢了?想问问各位大佬知道有可能是什么原因吗? 谢谢!!!

yangxue0827 commented 3 years ago

没用上gpu吧,cfgs上需要设置一下

bonnie-cbw commented 3 years ago

非常感谢您的解答! 这是我cfgs中一部分的设置 这个gpu_group我应该有设置呀,是设置的有什么问题吗? 我现在用的服务器是两块2080Ti的显卡

---------------------------------------- System_config

ROOT_PATH = os.path.abspath('../') print(3*"++--") print(ROOT_PATH) GPU_GROUP = "0,1" NUM_GPU = len(GPU_GROUP.strip().split(',')) SHOW_TRAIN_INFO_INTE = 20 SMRY_ITER = 2000 SAVE_WEIGHTS_INTE = 4000

SUMMARY_PATH = ROOT_PATH + '/output/summary' TEST_SAVE_PATH = ROOT_PATH + '/tools/test_result'

if NET_NAME.startswith("resnet"): weights_name = NET_NAME elif NET_NAME.startswith("MobilenetV2"): weights_name = "mobilenet/mobilenet_v2_1.0_224" else: raise Exception('net name must in [resnet_v1_101, resnet_v1_50, MobilenetV2]')

PRETRAINED_CKPT = ROOT_PATH + '/data/pretrained_weights/' + weights_name + '.ckpt' TRAINED_CKPT = os.path.join(ROOT_PATH, 'output/trained_weights')

EVALUATE_DIR = ROOT_PATH + '/output/evaluate_result_pickle/'/home/xianyun/cbw/RetinaNet_Tensorflow_Rotation-master/tools/test_dota

EVALUATE_DIR = ROOT_PATH + '/tools/test_dota/'

yangxue0827 commented 3 years ago

那可能安装的是cpu版本的tensorflow,应该安装tensorflow-gpu。你check一下gpu有没有被调用以及tensorflow的版本。

bonnie-cbw commented 3 years ago

好的!我检查一下!非常感谢!!!

bonnie-cbw commented 3 years ago

确实是因为我安装的cpu版本的tf!这个问题已经解决了。非常感谢您的解答! 我还有一个问题想请教一下 我在训练过程中,经常会报以下的错误:

2021-06-28 15:16:43.519534: W tensorflow/core/framework/op_kernel.cc:1389] Unknown: IndexError: index -9223372036853286991 is out of bounds for axis 0 with size 1681218 Traceback (most recent call last):

File "/home/xianyun/anaconda3/envs/rretinanet/lib/python3.5/site-packages/tensorflow/python/ops/script_ops.py", line 207, in call ret = func(*args)

File "../libs/detection_oprations/anchor_target_layer_without_boxweight.py", line 38, in anchor_target_layer max_overlaps = overlaps[np.arange(overlaps.shape[0]), argmax_overlaps_inds]

IndexError: index -9223372036853286991 is out of bounds for axis 0 with size 1681218

2021-06-28 15:16:43.684349: W tensorflow/core/kernels/queue_base.cc:277] _0_get_batch/input_producer: Skipping cancelled enqueue attempt with queue not closed 2021-06-28 15:16:43.684390: W tensorflow/core/kernels/queue_base.cc:277] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed 2021-06-28 15:16:43.684484: W tensorflow/core/kernels/queue_base.cc:277] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed 2021-06-28 15:16:43.684499: W tensorflow/core/kernels/queue_base.cc:277] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed 2021-06-28 15:16:43.684528: W tensorflow/core/kernels/queue_base.cc:277] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed 2021-06-28 15:16:43.684546: W tensorflow/core/kernels/queue_base.cc:277] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed 2021-06-28 15:16:43.684563: W tensorflow/core/kernels/queue_base.cc:277] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed 2021-06-28 15:16:43.684580: W tensorflow/core/kernels/queue_base.cc:277] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed 2021-06-28 15:16:43.684598: W tensorflow/core/kernels/queue_base.cc:277] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed 2021-06-28 15:16:43.684607: W tensorflow/core/kernels/queue_base.cc:277] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed 2021-06-28 15:16:43.684612: W tensorflow/core/kernels/queue_base.cc:277] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed 2021-06-28 15:16:43.684632: W tensorflow/core/kernels/queue_base.cc:277] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed 2021-06-28 15:16:43.684638: W tensorflow/core/kernels/queue_base.cc:277] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed 2021-06-28 15:16:43.684655: W tensorflow/core/kernels/queue_base.cc:277] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed 2021-06-28 15:16:43.684664: W tensorflow/core/kernels/queue_base.cc:277] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed 2021-06-28 15:16:43.684671: W tensorflow/core/kernels/queue_base.cc:277] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed 2021-06-28 15:16:43.684688: W tensorflow/core/kernels/queue_base.cc:277] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed Traceback (most recent call last): File "/home/xianyun/anaconda3/envs/rretinanet/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call return fn(*args) File "/home/xianyun/anaconda3/envs/rretinanet/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/home/xianyun/anaconda3/envs/rretinanet/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.UnknownError: IndexError: index -9223372036853286991 is out of bounds for axis 0 with size 1681218 Traceback (most recent call last):

File "/home/xianyun/anaconda3/envs/rretinanet/lib/python3.5/site-packages/tensorflow/python/ops/script_ops.py", line 207, in call ret = func(*args)

File "../libs/detection_oprations/anchor_target_layer_without_boxweight.py", line 38, in anchor_target_layer max_overlaps = overlaps[np.arange(overlaps.shape[0]), argmax_overlaps_inds]

IndexError: index -9223372036853286991 is out of bounds for axis 0 with size 1681218

 [[{{node tower_1/build_loss/PyFunc}}]]
 [[{{node tower_0/add_3}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "multi_gpu_train.py", line 354, in train() File "multi_gpu_train.py", line 317, in train sess.run([train_op, global_step, total_loss_dict]) File "/home/xianyun/anaconda3/envs/rretinanet/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 929, in run run_metadata_ptr) File "/home/xianyun/anaconda3/envs/rretinanet/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1152, in _run feed_dict_tensor, options, run_metadata) File "/home/xianyun/anaconda3/envs/rretinanet/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run run_metadata) File "/home/xianyun/anaconda3/envs/rretinanet/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.UnknownError: IndexError: index -9223372036853286991 is out of bounds for axis 0 with size 1681218 Traceback (most recent call last):

File "/home/xianyun/anaconda3/envs/rretinanet/lib/python3.5/site-packages/tensorflow/python/ops/script_ops.py", line 207, in call ret = func(*args)

File "../libs/detection_oprations/anchor_target_layer_without_boxweight.py", line 38, in anchor_target_layer max_overlaps = overlaps[np.arange(overlaps.shape[0]), argmax_overlaps_inds]

IndexError: index -9223372036853286991 is out of bounds for axis 0 with size 1681218

 [[node tower_1/build_loss/PyFunc (defined at ../libs/networks/build_whole_network.py:233) ]]
 [[node tower_0/add_3 (defined at multi_gpu_train.py:232) ]]

Caused by op 'tower_1/build_loss/PyFunc', defined at: File "multi_gpu_train.py", line 354, in train() File "multi_gpu_train.py", line 206, in train gpu_id=i) File "../libs/networks/build_whole_network.py", line 233, in build_whole_detection_network tf.float32]) File "/home/xianyun/anaconda3/envs/rretinanet/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py", line 324, in new_func return func(*args, *kwargs) File "/home/xianyun/anaconda3/envs/rretinanet/lib/python3.5/site-packages/tensorflow/python/ops/script_ops.py", line 468, in py_func func=func, inp=inp, Tout=Tout, stateful=stateful, eager=False, name=name) File "/home/xianyun/anaconda3/envs/rretinanet/lib/python3.5/site-packages/tensorflow/python/ops/script_ops.py", line 282, in _internal_py_func input=inp, token=token, Tout=Tout, name=name) File "/home/xianyun/anaconda3/envs/rretinanet/lib/python3.5/site-packages/tensorflow/python/ops/gen_script_ops.py", line 151, in py_func "PyFunc", input=input, token=token, Tout=Tout, name=name) File "/home/xianyun/anaconda3/envs/rretinanet/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper op_def=op_def) File "/home/xianyun/anaconda3/envs/rretinanet/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func return func(args, **kwargs) File "/home/xianyun/anaconda3/envs/rretinanet/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3300, in create_op op_def=op_def) File "/home/xianyun/anaconda3/envs/rretinanet/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1801, in init self._traceback = tf_stack.extract_stack()

UnknownError (see above for traceback): IndexError: index -9223372036853286991 is out of bounds for axis 0 with size 1681218 Traceback (most recent call last):

File "/home/xianyun/anaconda3/envs/rretinanet/lib/python3.5/site-packages/tensorflow/python/ops/script_ops.py", line 207, in call ret = func(*args)

File "../libs/detection_oprations/anchor_target_layer_without_boxweight.py", line 38, in anchor_target_layer max_overlaps = overlaps[np.arange(overlaps.shape[0]), argmax_overlaps_inds]

IndexError: index -9223372036853286991 is out of bounds for axis 0 with size 1681218

 [[node tower_1/build_loss/PyFunc (defined at ../libs/networks/build_whole_network.py:233) ]]
 [[node tower_0/add_3 (defined at multi_gpu_train.py:232) ]]

想问问您见过这个错误吗?您知道有什么解决的办法吗? 再次感谢!!!