ROCm / ROC-smi

ROC System Management Interface
https://github.com/RadeonOpenCompute/ROC-smi/blob/master/README.md
179 stars 55 forks source link

Displaying invalid GPU #40

Closed betterclever closed 6 years ago

betterclever commented 6 years ago

I have an Intel Hades Canyon NUC running ArchLinux with Kernel 4.18.7. It already has kfd modules. I installed tensorflow-rocm via docker images. I am not able to run anything, I get this error:

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
terminate called after throwing an instance of 'ihipException'
  what():  std::exception
Aborted (core dumped)

This aside, when I run rocm-smi, I get the following output. I find it fishy.

====================    ROCm System Management Interface    ====================
================================================================================
 GPU  Temp    AvgPwr   SCLK     MCLK     Fan      Perf    SCLK OD    MCLK OD
  0   N/A     N/A      N/A      N/A      0%       N/A       N/A        N/A      
  1   58c     N/A      1190Mhz  800Mhz   0%       auto      0%         0%       
================================================================================
====================           End of ROCm SMI Log          ====================

My NUC has only one VegaM GH GPU, why is it showing 2 GPUs then?

gstoner commented 6 years ago

Intel Hades Canyon NUC has Intel Integrated Graphics GPU, which is GPU 0, it is correct. you have another issue.

jlgreathouse commented 6 years ago

Your other issue is likely that, if your AMD GPU is a Vega M, it is not currently on our list of supported GPUs in ROCm.

betterclever commented 6 years ago

Thanks for the information. I thought rocm-sli detects only AMD GPUs. Also, I am keenly waiting for amdkfd support for Vega M. Hope to see it soon. :smile: