ROCm / ROCm-OpenCL-Runtime

ROCm OpenOpenCL Runtime
170 stars 60 forks source link

OpenCL crashes on Ubuntu 18.04 LTS with memory access fault when rendering blender demo #123

Closed tpkessler closed 4 years ago

tpkessler commented 4 years ago

Description

When I try to render the Junk Shop demo with blender 2.8.3, I get the following segmentation fault (backtrace with rocgdb):

Memory access fault by GPU node-1 (Agent handle: 0x7f942d124200) on address 0x7f6c14e3f000. Reason: Page not present or supervisor privilege.
--Type <RET> for more, q to quit, c to continue without paging--

Thread 76 "blender" received signal SIGABRT, Aborted.
[Wechseln zu Thread 0x7f9422778700 (LWP 5694)]
0x00007f94ea133e97 in raise () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0  0x00007f94ea133e97 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f94ea135801 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007f94229f7f43 in core::Runtime::VMFaultHandler(long, void*) () from /opt/rocm/lib/../opencl/lib/../../lib/libhsa-runtime64.so.1
#3  0x00007f94229f6505 in core::Runtime::AsyncEventsLoop(void*) () from /opt/rocm/lib/../opencl/lib/../../lib/libhsa-runtime64.so.1
#4  0x00007f94229b51c7 in os::ThreadTrampoline(void*) () from /opt/rocm/lib/../opencl/lib/../../lib/libhsa-runtime64.so.1
#5  0x00007f94eb8796db in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#6  0x00007f94ea21688f in clone () from /lib/x86_64-linux-gnu/libc.so.6

How to reproduce

system

$ uname -r
5.3.0-59-generic

$ lsmod | grep amdgpu
amdgpu               5218304  55
amd_iommu_v2           20480  1 amdgpu
amd_sched              32768  1 amdgpu
amdttm                 98304  1 amdgpu
amdkcl                 24576  2 amdttm,amdgpu
drm_kms_helper        180224  1 amdgpu
drm                   491520  22 drm_kms_helper,amd_sched,amdttm,amdgpu,amdkcl
i2c_algo_bit           16384  1 amdgpu

$ /opt/rocm/bin/rocminfo | grep Name
  Name:                    AMD Ryzen 7 2700X Eight-Core Processor
  Marketing Name:          AMD Ryzen 7 2700X Eight-Core Processor
  Vendor Name:             CPU                                
  Name:                    gfx900                             
  Marketing Name:          Vega 10 XT [Radeon RX Vega 64]     
  Vendor Name:             AMD                                
      Name:                    amdgcn-amd-amdhsa--gfx900          
vsytch commented 4 years ago

The ROCm stack is primarily focused on ML and HPC applications. Unfortunately Blender does not fall into those categories, hence the ROCm releases might not be validated against it.

I'd suggest you using the amdgpu-pro driver instead if youre' planning to use Blender. Even though it's the same codebase, that driver release goes through a full workstation certification.

https://www.amd.com/en/support/graphics/radeon-rx-vega-series/radeon-rx-vega-series/radeon-rx-vega-64

tpkessler commented 4 years ago

Thanks @vsytch for the explanation. I'm closing this issue.