changing number of compute units

trinayan commented 7 years ago

Is there any plans to support features like changing the number of active compute units on a GPU which is exposed to the user? I believe power gating is done automatically by some low level routines currently. Actually I saw a couple of papers on power from AMD research where they varied the number of active compute units. So was wondering if such functionalities will be available to the user. Thanks

kentrussell commented 7 years ago

It's not currently planned for the SMI, as changing the number of Compute Units (CUs) is supported in userspace through CU masking. Is there a situation where the number of CUs would need to be changed at kernel level via the SMI/sysfs and not via userspace?

jlgreathouse commented 7 years ago

I suppose part of the "blame" for this request falls on me -- I'm one of the folks from AMD Research who has published a number of papers using AMD-internal tools to reduce the number of active compute units. @trinayan may be referring to some of my work in his request.

To partially answer the question from @kentrussell -- one potential benefit of global CU control (rather than user-level CU masking) is that it allows fast changes without needing to modify the application. We used this, for instance, in this paper, where we quickly explored the performance change caused by CU differences across a ~100 applications all without changing any source code. We were able to do this because we used AMD-internal tools to reach in and disable a certain number of CUs, and then run all the benchmarks. Repeat for any particular CU count.

That being said, @trinayan , I think it's worth noting that even if this capability were made available, it may not do what you want. For instance, in this paper we showed per-CU power gating. This used a different mechanism (firmware configuration changes), because the above-mentioned software-based CU disabling mechanism does not automatically cause the CUs to be power gated on that generation of GPUs. This will also be true on many other generations of our GPUs.

What you should take away from this is that, while global CU disabling may make it easier to run many programs with a set number of compute units, it is unlikely to enable power gating studies. If all you care about is the possible performance differences caused by fewer/more CUs, you should explore the CU masking techniques that @kentrussell mentions. This is AMD's supported mechanism for using fewer CUs than are available in the hardware.

See the HSA queue function hsa_amd_queue_cu_set_mask() for more information.

trinayan commented 7 years ago

@jlgreathouse : Thanks for your information. Yes I have read the papers you have co authored and really like all the different papers. Due to unavailability of any support for changing clocks on ROCM APU's I used a Jetson device that had the clock changing flexibility although CU power gating was not supported. I was studying DVFS for emerging heterogeneous workloads like Hetero-Mark and Chai. But would love to extend my analysis on APU's if I can get access to some of the software through NDA's. Thank you for your reply.

jlgreathouse commented 7 years ago

@trinayan , you can contact me by email and we can see if there is anything AMD Research can do to help you. You can find my email address in the author block of the papers you've read. We may or may not be able to provide you with tools to help you with what you need.

For anyone else reading this who just wants to use a subset of CUs in an AMD GPU, I highly recommend looking at the CU masking mechanism listed above.

With that in mind, should this issue be closed?

ROCm / ROC-smi

changing number of compute units #5