Using the Dockerfile in scripts folder I managed to start train1.py
[1010 11:14:00 @base.py:158] Setup callbacks graph ...
[1010 11:14:00 @summary.py:34] Maintain moving average summary of 0 tensors.
[1010 11:14:02 @base.py:174] Creating the session ...
2019-10-10 11:14:02.528831: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-10-10 11:14:02.660072: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-10-10 11:14:02.661434: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties:
name: TITAN X (Pascal) major: 6 minor: 1 memoryClockRate(GHz): 1.531
pciBusID: 0000:01:00.0
totalMemory: 11.91GiB freeMemory: 11.29GiB
2019-10-10 11:14:02.661929: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2019-10-10 11:14:03.584727: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-10 11:14:03.584761: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0
2019-10-10 11:14:03.584779: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N
2019-10-10 11:14:03.585635: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 12072 MB memory) -> physical GPU (device: 0, name: TITAN X (Pascal), pci bus id: 0000:01:00.0, compute capability: 6.1)
[1010 11:14:04 @base.py:182] Initializing the session ...
[1010 11:14:04 @base.py:189] Graph Finalized.
2019-10-10 11:14:06.635234: W tensorflow/core/kernels/queue_base.cc:285] _0_QueueInput/input_queue: Skipping cancelled dequeue attempt with queue not closed
[1010 11:14:06 @concurrency.py:36] Starting EnqueueThread QueueInput/input_queue ...
[1010 11:14:06 @graph.py:70] Running Op sync_variables_from_main_tower ...
[1010 11:14:07 @base.py:209] Start Epoch 1 ...
12%|########2 |12/100[00:32<01:59, 0.73it/s]
However, you can see how it's extremely slow. Even though my GPU is recognised and the memory allocated, this is the actual usage from nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.14 Driver Version: 430.14 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 TITAN X (Pascal) Off | 00000000:01:00.0 On | N/A |
| 26% 49C P2 55W / 250W | 1884MiB / 12194MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
Using the
Dockerfile
inscripts
folder I managed to starttrain1.py
However, you can see how it's extremely slow. Even though my GPU is recognised and the memory allocated, this is the actual usage from
nvidia-smi
ideas?