ROCm / clr

MIT License
85 stars 35 forks source link

Blender crashes with HIP 5.6.0 on AMD Ryzen 5 5625U #11

Closed HurricanePootis closed 8 months ago

HurricanePootis commented 10 months ago

Main Problem

Trying to render a project with HIP results in blender 3.6.2 crashing. This is on Arch Linux. I have attached a backtrace of the crash

Backtraces

#0  0x00007fff84d58db8 in hip_impl::ihipOccupancyMaxActiveBlocksPerMultiprocessor(int*, int*, int*, amd::Device const&, ihipModuleSymbol_t*, int, unsigned long, bool) [clone .constprop.0]
    (maxBlocksPerCU=maxBlocksPerCU@entry=0x7ffe609a7b68, numBlocksPerGrid=numBlocksPerGrid@entry=0x7ffe609a7b70, bestBlockSize=bestBlockSize@entry=0x7ffe609a7b5c, device=..., func=func@entry=0x7ffe69001780, inputBlockSize=1024, inputBlockSize@entry=0, dynamicSMemSize=0, bCalcPotentialBlkSz=true)
    at /usr/src/debug/hip-runtime-amd/clr-rocm-5.6.0/hipamd/src/hip_platform.cpp:344
#1  0x00007fff84c43c35 in hipModuleOccupancyMaxPotentialBlockSize(int*, int*, hipFunction_t, size_t, int)
    (gridSize=<optimized out>, blockSize=0x7ffe8ffb9558, f=0x7ffe69001780, dynSharedMemPerBlk=<optimized out>, blockSizeLimit=<optimized out>)
    at /usr/src/debug/hip-runtime-amd/clr-rocm-5.6.0/hipamd/src/hip_platform.cpp:426
#2  0x00005555580a265b in ccl::HIPDeviceKernels::load(ccl::HIPDevice*) ()
#3  0x00005555580a2577 in ccl::HIPDevice::load_kernels(unsigned int) ()
#4  0x0000555557ef37ad in ccl::Scene::load_kernels(ccl::Progress&) ()
#5  0x0000555558043836 in ccl::Session::run_update_for_next_iteration() ()
#6  0x0000555558044e2b in ccl::Session::run_main_render_loop() ()
#7  0x000055555804593c in ccl::Session::thread_render() ()
#8  0x0000555558045b03 in ccl::Session::thread_run() ()
#9  0x000055555841e4de in ccl::thread::run(void*) ()
#10 0x00007fffe56e1943 in std::execute_native_thread_routine(void*) (__p=0x7ffe8ff0a620) at /usr/src/debug/gcc/gcc/libstdc++-v3/src/c++11/thread.cc:104
#11 0x00007fffe528c9eb in start_thread (arg=<optimized out>) at pthread_create.c:444
#12 0x00007fffe5310dfc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
Thread 74 "blender" received signal SIGFPE, Arithmetic exception.
[Switching to Thread 0x7ffe64aad000 (LWP 50176)]
0x00007fff86358db8 in hip_impl::ihipOccupancyMaxActiveBlocksPerMultiprocessor(int*, int*, int*, amd::Device const&, ihipModuleSymbol_t*, int, unsigned long, bool) [clone .constprop.0] (maxBlocksPerCU=maxBlocksPerCU@entry=0x7ffe64aa8b48, numBlocksPerGrid=numBlocksPerGrid@entry=0x7ffe64aa8b50, 
    bestBlockSize=bestBlockSize@entry=0x7ffe64aa8b3c, device=..., func=func@entry=0x7ffe6ae01780, inputBlockSize=1024, inputBlockSize@entry=0, dynamicSMemSize=0, 
    bCalcPotentialBlkSz=true) at /usr/src/debug/hip-runtime-amd/clr-rocm-5.6.0/hipamd/src/hip_platform.cpp:344
344         VgprWaves = maxVGPRs / amd::alignUp(wrkGrpInfo->usedVGPRs_, VgprGranularity);                                                                              

System Information

iassiour commented 9 months ago

I think the root cause is the same as https://github.com/ROCm-Developer-Tools/hipamd/issues/58 and the issue has been fixed internally. cc @cjatin

HurricanePootis commented 9 months ago

Well, should I ask about that problem on that issue, or should I continue here? To my understanding, that repo is deprecated now.

iassiour commented 9 months ago

Should continue here as that repo is indeed deprecated and has been migrated to clr.

The change made in that issue exists here under the clr repo, see check for the FPE here: https://github.com/ROCm-Developer-Tools/clr/blob/develop/hipamd/src/hip_platform.cpp#L345C19-L345C29

But I can see it did not make it into the 5.6 release. It should become available in 5.7.

iassiour commented 8 months ago

I am closing this issue as I see the fix above has landed on 5.7. Please re-open if the problem persists.