Closed perestoronin closed 6 years ago
What motherboard are you using, and which PCIe slots do you have the cards installed in? My first guess, since you haven't included very much information in this issue, is that you've installed the second card in one of the PCIe slots that hangs off the PCIe gen 2 slots that hangs off the chipset, rather than in one of the PCIe gen 3 slots that connects directly to the processor. Vega 10 on ROCm requires PCIe gen 3 with PCIe atomics.
Could you attach the output of dmesg
after you've freshly booted your system? Similarly, could you show the output of rocminfo
? Note that, for rocminfo
to work you will either need to run it as sudo, or ensure that your user is in the 'video' group. Could you also show the output of lspci -v
?
What operating system is this? Ubuntu 16.04.4?
Edit: I am asking for all of this information because I am unable to reproduce this problem with the level of detail you have provided. I have multiple systems with >1 GPU in them running on the ROCm software stack, including at least one system with two Vega 10 GPUs and a Ryzen CPU.
clinfo hangs Ryzen system with more two Vega10 cards in.
both version rocm stack 1.8.x and 1.7.x have same issue.