ethereum-mining / ethminer

Ethereum miner with OpenCL, CUDA and stratum support
GNU General Public License v3.0
5.96k stars 2.28k forks source link

[workaround] Crash on Mesa OpenCL / AMD - cl_amd_media_ops #2034

Open kallisti5 opened 3 years ago

kallisti5 commented 3 years ago

Describe the bug

ethminer :( $ ./build/ethminer/ethminer -P stratum+tcp://XXX@eu1.ethermine.org:4444

ethminer 0.19.0-5+commit.c118e833
Build: linux/release/gnu

 i 13:05:39 ethminer Configured pool eu1.ethermine.org:4444
 i 13:05:39 ethminer Selected pool eu1.ethermine.org:4444
 i 13:05:39 ethminer Stratum mode : Stratum
 i 13:05:39 ethminer Established connection to eu1.ethermine.org [172.65.207.106:4444]
 i 13:05:39 ethminer Spinning up miners...
cl 13:05:39 cl-0     no exit option enabled for non AMD opencl device
cl 13:05:39 cl-0     OpenCL OpenCL 1.1 Mesa 20.1.7 not supported, but platform Clover might work nevertheless. USE AT OWN RISK!
cl 13:05:39 cl-0     Using Device :  AMD Radeon (TM) RX 480 Graphics (POLARIS10, DRM 3.38.0, 5.8.6-201.fc32.x86_64, LLVM 10.0.1) OpenCL 1.1 Mesa 20.1.7 Memory : 8.00 GB (8589934592 B)
 i 13:05:40 ethminer Authorized worker 0xXXX
 i 13:05:40 ethminer Epoch : 361 Difficulty : 4.00 Gh
 i 13:05:40 ethminer Job: 50c734bd… eu1.ethermine.org [172.65.207.106:4444]
cl 13:05:41 cl-0     Generating split DAG + Light (total): 3.82 GB
cl 13:05:41 cl-0     OpenCL kernel
 i 13:05:42 ethminer Job: 794c27b6… eu1.ethermine.org [172.65.207.106:4444]
 X 13:05:42 cl-0     OpenCL kernel build log:
input.cl:312:9: error: implicit declaration of function 'amd_bitalign' is invalid in OpenCL
input.cl:179:9: note: expanded from macro 'KECCAK_PROCESS'
input.cl:98:60: note: expanded from macro 'KECCAKF_1600_RND'
input.cl:86:24: note: expanded from macro 'ROTL64_1'
input.cl:424:5: error: implicit declaration of function 'amd_bitalign' is invalid in OpenCL
input.cl:179:9: note: expanded from macro 'KECCAK_PROCESS'
input.cl:98:60: note: expanded from macro 'KECCAKF_1600_RND'
input.cl:86:24: note: expanded from macro 'ROTL64_1'

 X 13:05:42 cl-0     OpenCL kernel build error (-11):
clBuildProgram
SIGSEGV encountered ...
stack trace:
backtrace() returned 7 addresses
./build/ethminer/ethminer() [0x437d95]
/lib64/libc.so.6(+0x3ca70) [0x7fbb36980a70]
./build/ethminer/ethminer() [0x6df7ed]
./build/ethminer/ethminer() [0x4caed3]
./build/ethminer/ethminer() [0x7895b4]
/lib64/libpthread.so.0(+0x9432) [0x7fbb36c78432]
/lib64/libc.so.6(clone+0x43) [0x7fbb36a45913]
SIGSEGV encountered ...
stack trace:
backtrace() returned 9 addresses
./build/ethminer/ethminer() [0x437d95]
/lib64/libc.so.6(+0x3ca70) [0x7fbb36980a70]
./build/ethminer/ethminer() [0x430775]
./build/ethminer/ethminer() [0x43139f]
./build/ethminer/ethminer() [0x431861]
./build/ethminer/ethminer() [0x43ed05]
./build/ethminer/ethminer() [0x7895b4]
/lib64/libpthread.so.0(+0x9432) [0x7fbb36c78432]
/lib64/libc.so.6(clone+0x43) [0x7fbb36a45913]

Environment (please complete the following information):

$ rpm -qa | grep -i opencl opencl-utils-1-11.svn16.fc32.x86_64 opencl-filesystem-1.0-11.fc32.noarch wine-opencl-5.16-1.fc32.x86_64 mesa-libOpenCL-devel-20.1.7-1.fc32.x86_64 mesa-libOpenCL-20.1.7-1.fc32.x86_64 opencl-headers-2.2-6.20190205git49f07d3.fc32.noarch wine-opencl-5.16-1.fc32.i686

Patch I made the following change, and am now getting 24MH/s:

$ git diff
diff --git a/libethash-cl/kernels/cl/ethash.cl b/libethash-cl/kernels/cl/ethash.cl
index ce4586b67..82499837f 100644
--- a/libethash-cl/kernels/cl/ethash.cl
+++ b/libethash-cl/kernels/cl/ethash.cl
@@ -25,6 +25,7 @@
 #pragma OPENCL EXTENSION cl_clang_storage_class_specifiers : enable
 #endif

+#undef cl_amd_media_ops
 #if defined(cl_amd_media_ops)
 #pragma OPENCL EXTENSION cl_amd_media_ops : enable
 #elif defined(cl_nv_pragma_unroll)

Unsure.

kallisti5 commented 3 years ago

Hm.. while it's crunching hashes, the results aren't correct.

I installed the AMD opencl drivers via this guide and am now getting 22.8Mh/s (with accepted shares) https://ask.fedoraproject.org/t/guide-install-amdgpu-pro-opencl-in-fedora-32/7929

There's gotta be a better way to handle opensource Mesa / Clover drivers. They seem to crunch a full 2Mh/s faster.

hackmod commented 3 years ago

See also PR #2035 (not tested yet)

tuxd3v commented 3 years ago

I also have this problem, on a amd64 processor, ubunto 18.04, With Mesa Clover driver Mesa 20.3.0-devel (git-fd4d0b447c)

cl 23:17:25 cl-0     Platform: Clover
cl 23:17:25 cl-0     Device:   Radeon RX 580 Series (POLARIS10, DRM 3.23.0, 4.15.0-118-generic, LLVM 9.0.0) / OpenCL 1.2 Mesa 20.3.0-devel (git-fd4d0b447c)
 i 23:17:26 stratum  Job: #f7e45bbb… eu1.ethermine.org [172.65.207.106:4444]
 i 23:17:27 stratum  Job: #a5808cb6… eu1.ethermine.org [172.65.207.106:4444]
cl 23:17:28 cl-0     OpenCL kernel
 X 23:17:28 cl-0     OpenCL kernel build log:
input.cl:316:9: error: implicit declaration of function 'amd_bitalign' is invalid in OpenCL
input.cl:176:13: note: expanded from macro 'KECCAK_PROCESS'
input.cl:94:60: note: expanded from macro 'KECCAKF_1600_RND'
input.cl:82:24: note: expanded from macro 'ROTL64_1'
input.cl:426:5: error: implicit declaration of function 'amd_bitalign' is invalid in OpenCL
input.cl:176:13: note: expanded from macro 'KECCAK_PROCESS'
input.cl:94:60: note: expanded from macro 'KECCAKF_1600_RND'
input.cl:82:24: note: expanded from macro 'ROTL64_1'

 X 23:17:28 cl-0     OpenCL kernel build error (-11):
clBuildProgram
Segmentation fault (core dumped)

Anyway to solve this? I am using official binaries from ethminer..

hackmod commented 3 years ago

There are several reports about the mesa-opencl, but it does not work or not as stable as expected.

you can remove mesa-opencl-icd and try another *-opencl-icd like as opencl-amdgpu-pro-icd

eathtespagheti commented 3 years ago

Same issue on arch linux, and opencl from amdgpu-pro just go segfault

641i130 commented 3 years ago

Same issue on Arch Linux (RX 580) as well. Any simple fix yet?

Michael-Gallo commented 3 years ago

I'm having the same issue, also on Arch LInux with an RX 580

eathtespagheti commented 3 years ago

As a workaround on Arch with a 580 I'm using the opencl provided from amdgpu-pro drivers, it can be installed with the AUR package opencl-amd

According to the arch wiki GPGPU article (https://wiki.archlinux.org/index.php/GPGPU#AMD/ATI) there's also the ROCm implementation, but I'm unable to install it because of a build error, can anyone try it?

tuxd3v commented 3 years ago

Yeah this problem still persist, and it has todo with ethereum using exclusive amd functions in the code, instead of generic ones used by Clover :/