Closed vade closed 6 years ago
This is not a known issue. Its unexpected. We will look into this.
@srikris Thanks for the very prompt reply. Please let me know if you need any diagnostics. Happy to help.
Hi. We are seeing the same issue:
Thanks @vade and @skercher. We'll keep you posted on this.
@vade Are you using Object detection, Image Classification, or Style transfer?
We are using Image Classifier. :)
@vade @skercher - I suspect this is related to batch size. In the latest release we decreased the batch size from 512 to 64. Please try increasing the batch_size
parameter of image_classifier.create(...)
to 512
and let us know if this solves the issue.
Thanks @TobyRoseman - will give this a shot. I've mentioned this to our team and will let you know.
Batch size of 512 didn't have any effect. Is the idea that the create function will split batches of > 512 across GPUs? Are weight updates shared across GPUs or are the weights also divided per GPU and updated separately a la AlexNet?
Hi @TobyRoseman. Adding batch size seemed to make the training a bit faster but still didn't utilize all our GPU's.
I am not able to reproduce this problem on Ubuntu/CUDA. All my GPUs are getting used to some degree.
Could someone who is having this issue please let me know two things: 1 - How many GPUs they have. 2 - The output of the following code:
import turicreate as tc
print(tc.toolkits._mxnet_utils.get_mxnet_context())
oscar@MayallsObject ~/Oscar % python
Python 2.7.12 (default, Dec 4 2017, 14:50:18)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import turicreate as tc
>>> print(tc.toolkits._mxnet_utils.get_mxnet_context())
[gpu(0), gpu(1), gpu(2), gpu(3), gpu(4)]
>>>
@vade - and you have five GPUs?
@TobyRoseman - thanks for the test - in your successful environment can you please note if you are using the release of Turi Create 5b1, what Cuda version and GL driver version and MXNet version you are using just for comparisons sake?
@tbartelmess Correct - 2 1080's, 3 1070s.
My Nvidia SMI output from the machine I ran the >>> print(tc.toolkits._mxnet_utils.get_mxnet_context()) code on:
oscar@MayallsObject ~/Oscar % nvidia-smi
Wed Jun 27 16:47:14 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.67 Driver Version: 390.67 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1080 Off | 00000000:01:00.0 On | N/A |
|100% 40C P2 58W / 180W | 6844MiB / 8116MiB | 52% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 1080 Off | 00000000:02:00.0 Off | N/A |
|100% 37C P8 17W / 180W | 7230MiB / 8119MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 GeForce GTX 1070 Off | 00000000:04:00.0 Off | N/A |
|100% 33C P8 20W / 230W | 4768MiB / 8119MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 GeForce GTX 1070 Off | 00000000:05:00.0 Off | N/A |
|100% 34C P8 20W / 230W | 4768MiB / 8119MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 4 GeForce GTX 1070 Off | 00000000:09:00.0 Off | N/A |
|100% 32C P8 20W / 230W | 7260MiB / 8119MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
Additional info if pertinent:
scar@MayallsObject ~/Oscar % nvidia-smi topo -m
GPU0 GPU1 GPU2 GPU3 GPU4 CPU Affinity
GPU0 X PHB PHB PHB PHB 0-3
GPU1 PHB X PHB PHB PHB 0-3
GPU2 PHB PHB X PHB PHB 0-3
GPU3 PHB PHB PHB X PHB 0-3
GPU4 PHB PHB PHB PHB X 0-3
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe switches (without traversing the PCIe Host Bridge)
PIX = Connection traversing a single PCIe switch
NV# = Connection traversing a bonded set of # NVLinks
@vade, My version info -
turicreate: 5.0b1
.
MXNet: mxnet-cu80==1.1.0
.
Ubuntu: 16.04
Cuda: release 8.0, V8.0.44
Driver Version: 375.66
How many examples are in your dataset?
For the GPU which is doing the work, what is the typical utilization? For the other GPUs, is there ever any non-zero utilization?
Thanks. Perhaps the issue is we are running a different version of mxNet 1.1.0 against cuda 9.0, with a newer driver. We spent a lot of time trying to get a variety of tool chains to 'agree' on a driver / cuda version since turi isnt the only ML framework we use.
Examples in data sets range from thousands to hundreds of thousands
The GPU typical load for a single run of a batch size 512 is roughly 20 - 30%. Running multiple concurrent training jobs garners up to 60 max Ive seen while monitoring, nary any GPU utilization across any GPU other than the zeroth.
@TobyRoseman Let me know if you need anything else.
Python 3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 18:10:19) [GCC 7.2.0] on linux Type "help", "copyright", "credits" or "license" for more information.
import turicreate as tc print(tc.toolkits._mxnet_utils.get_mxnet_context()) [gpu(0), gpu(1), gpu(2), gpu(3)]
nvidia-smi topo -m GPU0 GPU1 GPU2 GPU3 CPU Affinity GPU0 X NV1 NV1 NV2 0-39 GPU1 NV1 X NV2 NV1 0-39 GPU2 NV1 NV2 X NV1 0-39 GPU3 NV2 NV1 NV1 X 0-39
Dupe of #741
Can someone explain why this issue was closed? @afranklin ?
@vade This is a dupe of #741. I closed the more recent filing instead of the earlier one.
@afranklin: On closer inspection. These are not the same issue. @vade is not able to see multiple GPUs being used while #741 is about saturating multiple GPUs.
Would a Turi Create engineer kindly comment on whether:
Cuda 9 / 9.x, and newer Nvidia drivers are expected to work with multiple GPUs with respect to Turi Create 5 b 1 on linux?
Any alternate builds of MXNet 1.1.0 expected to work with Turi Create across multiple GPUs with respect to Turi Create 5 b 1 on linux?
Thank you.
@vade - I would expect it to work with newer versions of Cuda and newer drivers. However that has not been tested.
Please clarify your second question: what do you mean by an "alternate builds of MXNet 1.1.0"? Are you talking about a different release version of MxNet? Or something other than mxnet-cu80
?
@TobyRoseman for example mxnet-cu90--1.1.0 same version number, but expects cuda 9.
@vade - thanks for clarifying. I would also expect that to work, but we have not tested it.
Could everyone else having this problem, please let us know which version of Cuda you are using?
I finally got a chance test this on a two GPU system with CUDA 9. Using the most recent version of TuriCreate (5.0b3
) I was not able to reproduce this issue; both GPUs were getting used about the same amount. I was also using mxnet-cu90==1.1.0
.
Feel free to reopen if this is still and issue with the most recent version of TuriCreate.
Hi Toby - thanks for testing. Can you please document the specific Nvidia driver version and Cuda version (which cuda 9 point release?)
Thanks! Happy to verify with your versions shortly!
cc @genp ;)
@vade -
Cuda: release 9.0, V9.0.176
NVIDIA Driver Version: 390.30
@TobyRoseman
@vade, My version info - turicreate:
5.0b1
. MXNet:mxnet-cu80==1.1.0
. Ubuntu: 16.04 Cuda: release 8.0, V8.0.44 Driver Version: 375.66How many examples are in your dataset?
For the GPU which is doing the work, what is the typical utilization? For the other GPUs, is there ever any non-zero utilization?
The size of example dataset will affect the GPUs utilization?
Hello
Successfully running Turi Create 5.0b1 on Ubuntu with GPU training and putting Core ML models. Very cool.
Our system has 5 GPUs - 2x 1080s and 3x 1070s. Our Turi Create script sets tc.config.set_num_gpus(-1) but we fail to ever see any GPU other than the first in use.
Is this a known issue? is this subject to hardware config?
System: Ubuntu 16.0.4
NVidia Drivers 390.48, Cuda 9.0,
Topology:
GPU0 X PHB PHB PHB PHB 0-3 GPU1 PHB X PHB PHB PHB 0-3 GPU2 PHB PHB X PHB PHB 0-3 GPU3 PHB PHB PHB X PHB 0-3 GPU4 PHB PHB PHB PHB X 0-3
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 390.48 Driver Version: 390.48 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 1080 Off | 00000000:01:00.0 On | N/A | |100% 39C P2 58W / 180W | 1067MiB / 8116MiB | 37% Default | +-------------------------------+----------------------+----------------------+ | 1 GeForce GTX 1080 Off | 00000000:02:00.0 Off | N/A | |100% 36C P8 16W / 180W | 786MiB / 8119MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 2 GeForce GTX 1070 Off | 00000000:04:00.0 Off | N/A | |100% 30C P8 19W / 230W | 734MiB / 8119MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 3 GeForce GTX 1070 Off | 00000000:05:00.0 Off | N/A | |100% 32C P8 20W / 230W | 640MiB / 8119MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 4 GeForce GTX 1070 Off | 00000000:09:00.0 Off | N/A | |100% 29C P8 20W / 230W | 546MiB / 8119MiB | 0% Default | +-------------------------------+----------------------+----------------------+