Open bonnie-cbw opened 3 years ago
没用上gpu吧,cfgs上需要设置一下
非常感谢您的解答! 这是我cfgs中一部分的设置 这个gpu_group我应该有设置呀,是设置的有什么问题吗? 我现在用的服务器是两块2080Ti的显卡
ROOT_PATH = os.path.abspath('../') print(3*"++--") print(ROOT_PATH) GPU_GROUP = "0,1" NUM_GPU = len(GPU_GROUP.strip().split(',')) SHOW_TRAIN_INFO_INTE = 20 SMRY_ITER = 2000 SAVE_WEIGHTS_INTE = 4000
SUMMARY_PATH = ROOT_PATH + '/output/summary' TEST_SAVE_PATH = ROOT_PATH + '/tools/test_result'
if NET_NAME.startswith("resnet"): weights_name = NET_NAME elif NET_NAME.startswith("MobilenetV2"): weights_name = "mobilenet/mobilenet_v2_1.0_224" else: raise Exception('net name must in [resnet_v1_101, resnet_v1_50, MobilenetV2]')
PRETRAINED_CKPT = ROOT_PATH + '/data/pretrained_weights/' + weights_name + '.ckpt' TRAINED_CKPT = os.path.join(ROOT_PATH, 'output/trained_weights')
EVALUATE_DIR = ROOT_PATH + '/tools/test_dota/'
那可能安装的是cpu版本的tensorflow,应该安装tensorflow-gpu。你check一下gpu有没有被调用以及tensorflow的版本。
好的!我检查一下!非常感谢!!!
确实是因为我安装的cpu版本的tf!这个问题已经解决了。非常感谢您的解答! 我还有一个问题想请教一下 我在训练过程中,经常会报以下的错误:
2021-06-28 15:16:43.519534: W tensorflow/core/framework/op_kernel.cc:1389] Unknown: IndexError: index -9223372036853286991 is out of bounds for axis 0 with size 1681218 Traceback (most recent call last):
File "/home/xianyun/anaconda3/envs/rretinanet/lib/python3.5/site-packages/tensorflow/python/ops/script_ops.py", line 207, in call ret = func(*args)
File "../libs/detection_oprations/anchor_target_layer_without_boxweight.py", line 38, in anchor_target_layer max_overlaps = overlaps[np.arange(overlaps.shape[0]), argmax_overlaps_inds]
IndexError: index -9223372036853286991 is out of bounds for axis 0 with size 1681218
2021-06-28 15:16:43.684349: W tensorflow/core/kernels/queue_base.cc:277] _0_get_batch/input_producer: Skipping cancelled enqueue attempt with queue not closed 2021-06-28 15:16:43.684390: W tensorflow/core/kernels/queue_base.cc:277] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed 2021-06-28 15:16:43.684484: W tensorflow/core/kernels/queue_base.cc:277] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed 2021-06-28 15:16:43.684499: W tensorflow/core/kernels/queue_base.cc:277] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed 2021-06-28 15:16:43.684528: W tensorflow/core/kernels/queue_base.cc:277] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed 2021-06-28 15:16:43.684546: W tensorflow/core/kernels/queue_base.cc:277] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed 2021-06-28 15:16:43.684563: W tensorflow/core/kernels/queue_base.cc:277] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed 2021-06-28 15:16:43.684580: W tensorflow/core/kernels/queue_base.cc:277] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed 2021-06-28 15:16:43.684598: W tensorflow/core/kernels/queue_base.cc:277] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed 2021-06-28 15:16:43.684607: W tensorflow/core/kernels/queue_base.cc:277] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed 2021-06-28 15:16:43.684612: W tensorflow/core/kernels/queue_base.cc:277] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed 2021-06-28 15:16:43.684632: W tensorflow/core/kernels/queue_base.cc:277] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed 2021-06-28 15:16:43.684638: W tensorflow/core/kernels/queue_base.cc:277] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed 2021-06-28 15:16:43.684655: W tensorflow/core/kernels/queue_base.cc:277] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed 2021-06-28 15:16:43.684664: W tensorflow/core/kernels/queue_base.cc:277] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed 2021-06-28 15:16:43.684671: W tensorflow/core/kernels/queue_base.cc:277] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed 2021-06-28 15:16:43.684688: W tensorflow/core/kernels/queue_base.cc:277] _1_get_batch/batch/padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed Traceback (most recent call last): File "/home/xianyun/anaconda3/envs/rretinanet/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call return fn(*args) File "/home/xianyun/anaconda3/envs/rretinanet/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/home/xianyun/anaconda3/envs/rretinanet/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.UnknownError: IndexError: index -9223372036853286991 is out of bounds for axis 0 with size 1681218 Traceback (most recent call last):
File "/home/xianyun/anaconda3/envs/rretinanet/lib/python3.5/site-packages/tensorflow/python/ops/script_ops.py", line 207, in call ret = func(*args)
File "../libs/detection_oprations/anchor_target_layer_without_boxweight.py", line 38, in anchor_target_layer max_overlaps = overlaps[np.arange(overlaps.shape[0]), argmax_overlaps_inds]
IndexError: index -9223372036853286991 is out of bounds for axis 0 with size 1681218
[[{{node tower_1/build_loss/PyFunc}}]]
[[{{node tower_0/add_3}}]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "multi_gpu_train.py", line 354, in
File "/home/xianyun/anaconda3/envs/rretinanet/lib/python3.5/site-packages/tensorflow/python/ops/script_ops.py", line 207, in call ret = func(*args)
File "../libs/detection_oprations/anchor_target_layer_without_boxweight.py", line 38, in anchor_target_layer max_overlaps = overlaps[np.arange(overlaps.shape[0]), argmax_overlaps_inds]
IndexError: index -9223372036853286991 is out of bounds for axis 0 with size 1681218
[[node tower_1/build_loss/PyFunc (defined at ../libs/networks/build_whole_network.py:233) ]]
[[node tower_0/add_3 (defined at multi_gpu_train.py:232) ]]
Caused by op 'tower_1/build_loss/PyFunc', defined at:
File "multi_gpu_train.py", line 354, in
UnknownError (see above for traceback): IndexError: index -9223372036853286991 is out of bounds for axis 0 with size 1681218 Traceback (most recent call last):
File "/home/xianyun/anaconda3/envs/rretinanet/lib/python3.5/site-packages/tensorflow/python/ops/script_ops.py", line 207, in call ret = func(*args)
File "../libs/detection_oprations/anchor_target_layer_without_boxweight.py", line 38, in anchor_target_layer max_overlaps = overlaps[np.arange(overlaps.shape[0]), argmax_overlaps_inds]
IndexError: index -9223372036853286991 is out of bounds for axis 0 with size 1681218
[[node tower_1/build_loss/PyFunc (defined at ../libs/networks/build_whole_network.py:233) ]]
[[node tower_0/add_3 (defined at multi_gpu_train.py:232) ]]
想问问您见过这个错误吗?您知道有什么解决的办法吗? 再次感谢!!!
求助求助! 您好,我使用的裁剪为800×800的DOTA数据集,ResNet50,按照您的教程一步步做的,环境的配置也没有什么问题,但是训练速度大概在2.8秒一张图。这个速度是不是有一些太慢了?想问问各位大佬知道有可能是什么原因吗? 谢谢!!!