ROCm / ROCm-docker

Dockerfiles for the various software layers defined in the ROCm software platform
MIT License
422 stars 64 forks source link

rocm/pytorch does not support gfx1030 #89

Closed CyberShadow closed 2 years ago

CyberShadow commented 2 years ago

With a Radeon RX 6900 XT:

printf 'import torch\nprint(torch.cuda.is_available())' | 
    docker run --rm -i --device=/dev/kfd --device=/dev/dri \
    --security-opt seccomp=unconfined --group-add video \
    --shm-size 8G \
    rocm/pytorch:rocm5.1.1_ubuntu20.04_py3.7_pytorch_1.10.0 python
"hipErrorNoBinaryForGpu: Unable to find code object for all current devices!"

As I understand this is because the binary blob passed to libamdhip64.so doesn't have the gfx1030 target, possibly because of misconfigured AMDGPU_TARGETS, but looking at other cases of this error message elsewhere, there's reports that rebuilding with another clang version or in a different environment made the error go away.

CyberShadow commented 2 years ago

Does not happen with pytorch_staging. Not sure if this is a random fluke or if there is actually something special about the Pytorch version regarding GPU support. In any case, not blocking me any longer, so closing this issue.