Closed ejp-zz closed 10 years ago
The GPU utilization that NVIDIA-smi shows is the percentage of time in the last second of operation of GPU kernels that the GPUs were used - which is not a very good measure of GPU utilization. You can read up more on
http://docs.nvidia.com/deploy/nvml-api/structnvmlUtilization__t.html#structnvmlUtilization__t
So, this low statistic could very well be because of some small amounts of thread divergence. Although we have tried to avoid thread divergence largely, there are still some cases where it could has some effects. i.e. Some of the threads running the kernel might take longer than others because the code is different for different "if" conditions.
In order to get a good idea of utilization you will have to do complete profiling. As the people in the thread say "Utilization is not a how well you're using the resources statistic but if you're using the resources".
Let me just comment on the 2 options you suggested:
We still have work to do on the GPU optimization and hopefully we will make it faster in time. Thanks for the insightful comments. They'll help us improve the code.
Cheers Abhishek
Hi Abhishek, Thanks for the clarifying. Please let us know when a more optimized GPU version becomes available. Elbert
When I run the Taylor Green vortex test case I find that only 21% of the GPU (Tesla C2050) is utilized. [ejeyapau@r219i0n0 ~]$ nvidia-smi Mon Jun 23 17:34:39 2014
+------------------------------------------------------+
| NVIDIA-SMI 3.295.41 Driver Version: 295.41 |
|-------------------------------+----------------------+----------------------+ | Nb. Name | Bus Id Disp. | Volatile ECC SB / DB | | Fan Temp Power Usage /Cap | Memory Usage | GPU Util. Compute M. | |===============================+======================+======================| | 0. Tesla M2090 | 0000:02:00.0 Off | 0 0 | | N/A N/A P0 137W / 225W | 7% 383MB / 5375MB | 21% Default | |-------------------------------+----------------------+----------------------| | Compute processes: GPU Memory | | GPU PID Process name Usage | |=============================================================================| | 0. 78932 ../../../bin/HiFiLES 370MB | +-----------------------------------------------------------------------------+
Does this mean that all of the 448 processing units are being utilized on the Tesla C2050 card? I am running this on a 12 core machine with 1 GPU card. For better utilization, here are few options, 1.run multiple jobs on the same GPU 2.run a GPU job (uses 1 core) and a CPU job utilizing remaining 11 cores. Are both good options?
Or rather, is there a way to estimate the optimal GPU requirements (memory & load) for a given problem. Sorry, the question is not directly related to the code. Thanks, Elbert