程序运行出错，无法调用GPU

zzybj commented 3 years ago

2020-11-13 02:23:11.384991: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2200000000 Hz 2020-11-13 02:23:11.385217: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x30c3100 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2020-11-13 02:23:11.385248: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version 2020-11-13 02:23:11.387063: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1 2020-11-13 02:23:11.481223: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-11-13 02:23:11.481876: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x30c32c0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices: 2020-11-13 02:23:11.481917: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Tesla T4, Compute Capability 7.5 2020-11-13 02:23:11.482084: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-11-13 02:23:11.482636: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties: name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59 pciBusID: 0000:00:04.0 2020-11-13 02:23:11.482720: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1 2020-11-13 02:23:11.484130: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10 2020-11-13 02:23:11.485643: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10 2020-11-13 02:23:11.485956: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10 2020-11-13 02:23:11.487339: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10 2020-11-13 02:23:11.488005: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10 2020-11-13 02:23:11.490719: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7 2020-11-13 02:23:11.490831: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-11-13 02:23:11.491423: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-11-13 02:23:11.491921: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0 2020-11-13 02:23:11.491982: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1 2020-11-13 02:23:11.494064: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-11-13 02:23:11.494092: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186] 0 2020-11-13 02:23:11.494103: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 0: N 2020-11-13 02:23:11.494232: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-11-13 02:23:11.494778: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-11-13 02:23:11.495338: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14121 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:04.0, compute capability: 7.5) 2020-11-13 02:23:11.802288: W tensorflow/core/common_runtime/colocation_graph.cc:983] Failed to place the graph without changing the devices of some resources. Some of the operations (that had to be colocated with resource generating operations) are not supported on the resources' devices. Current candidate devices are [ /job:localhost/replica:0/task:0/device:CPU:0]. See below for details of this colocation group: Colocation Debug Info: Colocation group had the following types and supported devices: Root Member(assigned_device_nameindex=-1 requested_devicename='/device:GPU:0' assigned_devicename='' resource_devicename='/device:GPU:0' supported_devicetypes=[CPU] possibledevices=[] Identity: GPU CPU XLA_CPU XLA_GPU VariableV2: CPU Assign: CPU

Colocation members, user-requested devices, and framework assigned devices, if any: get_batch/matching_filenames (VariableV2) /device:GPU:0 get_batch/matching_filenames/Assign (Assign) /device:GPU:0 get_batch/matching_filenames/read (Identity) /device:GPU:0

2020-11-13 02:23:11.802745: W tensorflow/core/common_runtime/colocation_graph.cc:983] Failed to place the graph without changing the devices of some resources. Some of the operations (that had to be colocated with resource generating operations) are not supported on the resources' devices. Current candidate devices are [ /job:localhost/replica:0/task:0/device:CPU:0]. See below for details of this colocation group: Colocation Debug Info: Colocation group had the following types and supported devices: Root Member(assigned_device_nameindex=-1 requested_devicename='/device:GPU:0' assigned_devicename='' resource_devicename='/device:GPU:0' supported_devicetypes=[CPU] possibledevices=[] ReaderReadV2: CPU TFRecordReaderV2: CPU QueueSizeV2: GPU CPU XLA_CPU XLA_GPU QueueCloseV2: GPU CPU XLA_CPU XLA_GPU FIFOQueueV2: CPU XLA_CPU XLA_GPU QueueEnqueueManyV2: CPU

Colocation members, user-requested devices, and framework assigned devices, if any: get_batch/input_producer (FIFOQueueV2) /device:GPU:0 get_batch/input_producer/input_producer_EnqueueMany (QueueEnqueueManyV2) /device:GPU:0 get_batch/input_producer/input_producer_Close (QueueCloseV2) /device:GPU:0 get_batch/input_producer/input_producer_Close_1 (QueueCloseV2) /device:GPU:0 get_batch/input_producer/input_producer_Size (QueueSizeV2) /device:GPU:0 get_batch/TFRecordReaderV2 (TFRecordReaderV2) /device:GPU:0 get_batch/ReaderReadV2 (ReaderReadV2) /device:GPU:0

2020-11-13 02:23:11.803188: W tensorflow/core/common_runtime/colocation_graph.cc:983] Failed to place the graph without changing the devices of some resources. Some of the operations (that had to be colocated with resource generating operations) are not supported on the resources' devices. Current candidate devices are [ /job:localhost/replica:0/task:0/device:CPU:0]. See below for details of this colocation group: Colocation Debug Info: Colocation group had the following types and supported devices: Root Member(assigned_device_nameindex=-1 requested_devicename='/device:GPU:0' assigned_devicename='' resource_devicename='/device:GPU:0' supported_devicetypes=[CPU] possibledevices=[] QueueDequeueManyV2: CPU QueueCloseV2: GPU CPU XLA_CPU XLA_GPU PaddingFIFOQueueV2: CPU QueueSizeV2: GPU CPU XLA_CPU XLA_GPU QueueEnqueueV2: GPU CPU XLA_CPU XLA_GPU

Colocation members, user-requested devices, and framework assigned devices, if any: get_batch/batch/padding_fifo_queue (PaddingFIFOQueueV2) /device:GPU:0 get_batch/batch/padding_fifo_queue_enqueue (QueueEnqueueV2) /device:GPU:0 get_batch/batch/padding_fifo_queue_Close (QueueCloseV2) /device:GPU:0 get_batch/batch/padding_fifo_queue_Close_1 (QueueCloseV2) /device:GPU:0 get_batch/batch/padding_fifo_queue_Size (QueueSizeV2) /device:GPU:0 get_batch/batch (QueueDequeueManyV2) /device:GPU:0

WARNING:tensorflow:From multi_gpu_train_r3det.py:319: start_queue_runners (from tensorflow.python.training.queue_runner_impl) is deprecated and will be removed in a future version. Instructions for updating: To construct input pipelines, use the tf.data module. WARNING:tensorflow:From multi_gpu_train_r3det.py:323: The name tf.summary.FileWriter is deprecated. Please use tf.compat.v1.summary.FileWriter instead. restore model 2020-11-13 02:23:25.462669: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7 2020-11-13 02:23:27.224448: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10

2020-11-13 02:24:39: global_step:19 current_step:20 per_cost_time:12.169s cls_loss:0.127 reg_loss:0.000 refine_cls_loss:0.006 refine_reg_loss:0.000 refine_cls_loss_stage3:0.000 refine_reg_loss_stage3:0.000 total_losses:0.133

2020-11-13 02:25:21: global_step:39 current_step:40 per_cost_time:1.232s cls_loss:0.163 reg_loss:0.000 refine_cls_loss:0.008 refine_reg_loss:0.000 refine_cls_loss_stage3:0.000 refine_reg_loss_stage3:0.000 total_losses:0.171

2020-11-13 02:25:50: global_step:59 current_step:60 per_cost_time:1.187s cls_loss:0.148 reg_loss:0.000 refine_cls_loss:0.007 refine_reg_loss:0.000 refine_cls_loss_stage3:0.000 refine_reg_loss_stage3:0.000 total_losses:0.155

yangxue0827 commented 3 years ago

安装tensorflow-gpu

zzybj commented 3 years ago

安装tensorflow-gpu

这样是有结果的： import tensorflow as tf

with tf.device('/gpu:0'): v1 = tf.constant([1.0, 2.0, 3.0], shape=[3], name='v1') v2 = tf.constant([1.0, 2.0, 3.0], shape=[3], name='v2') sumV12 = v1 + v2

with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as sess:
    print(sess.run(sumV12))

[2. 4. 6.]

yangxue0827 commented 3 years ago

有结果不代表你调用了gpu，你得用nvidia-smi来观察。你就确认一下安装的是不是tensorflow-gpu

Thinklab-SJTU / R3Det_Tensorflow

程序运行出错，无法调用GPU #103