ROCm / ROCm-OpenCL-Driver

ROCm OpenCL Compiler Tool Driver
MIT License
24 stars 9 forks source link

clinfo hangs with more two Vega10 cards in Ryzen system #66

Closed perestoronin closed 6 years ago

perestoronin commented 6 years ago

clinfo hangs Ryzen system with more two Vega10 cards in.

both version rocm stack 1.8.x and 1.7.x have same issue.

jlgreathouse commented 6 years ago

What motherboard are you using, and which PCIe slots do you have the cards installed in? My first guess, since you haven't included very much information in this issue, is that you've installed the second card in one of the PCIe slots that hangs off the PCIe gen 2 slots that hangs off the chipset, rather than in one of the PCIe gen 3 slots that connects directly to the processor. Vega 10 on ROCm requires PCIe gen 3 with PCIe atomics.

Could you attach the output of dmesg after you've freshly booted your system? Similarly, could you show the output of rocminfo? Note that, for rocminfo to work you will either need to run it as sudo, or ensure that your user is in the 'video' group. Could you also show the output of lspci -v?

What operating system is this? Ubuntu 16.04.4?

Edit: I am asking for all of this information because I am unable to reproduce this problem with the level of detail you have provided. I have multiple systems with >1 GPU in them running on the ROCm software stack, including at least one system with two Vega 10 GPUs and a Ryzen CPU.