alephium / gpu-miner

GNU Lesser General Public License v3.0
36 stars 31 forks source link

Floating point exception #53

Closed PSLLSP closed 2 years ago

PSLLSP commented 2 years ago

alephium-0.5.4-cuda-miner-linux Ubuntu 21.04

I cannot start miner at older AMD CPU (socket AM3). I tried to start it on Intel XEON E3 CPU and it was started but that PC has no NVIDIA GPU... I am not sure, maybe that miner uses some FPU instruction that was implemented in the recent FPU/CPU. Error is Floating point exception (core dumped):

$ ./alephium-0.5.4-cuda-miner-linux
2022-02-02 12:36:38 | Running gpu-miner version : release-v0.5.4
2022-02-02 12:36:38 | GPU count: 1
2022-02-02 12:36:38 | GPU #0 has #640 cores
2022-02-02 12:36:38 | will connect to broker @127.0.0.1:10973
Floating point exception (core dumped)
$ lscpu | grep "Model name"
Model name:                      AMD FX(tm)-8300 Eight-Core Processor
$ nvidia-smi 
Wed Feb  2 12:40:04 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.106.00   Driver Version: 460.106.00   CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 750 Ti  Off  | 00000000:01:00.0 Off |                  N/A |
| 67%   72C    P0    35W /  38W |   1791MiB /  2000MiB |    100%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A     64574      C   ./miniZ                          1775MiB |
|    0   N/A  N/A    100711      G   /usr/lib/xorg/Xorg                  7MiB |
|    0   N/A  N/A    100737      G   /usr/bin/gnome-shell                2MiB |
+-----------------------------------------------------------------------------+
$ lscpu
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   48 bits physical, 48 bits virtual
CPU(s):                          8
On-line CPU(s) list:             0-7
Thread(s) per core:              2
Core(s) per socket:              4
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       AuthenticAMD
CPU family:                      21
Model:                           2
Model name:                      AMD FX(tm)-8300 Eight-Core Processor
Stepping:                        0
Frequency boost:                 enabled
CPU MHz:                         3314.813
CPU max MHz:                     3300.0000
CPU min MHz:                     1400.0000
BogoMIPS:                        6629.63
Virtualization:                  AMD-V
L1d cache:                       64 KiB
L1i cache:                       256 KiB
L2 cache:                        8 MiB
L3 cache:                        8 MiB
NUMA node0 CPU(s):               0-7
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Full AMD retpoline, IBPB conditional, STIBP disabled, RSB filling
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb cpb hw_pstate ssbd ibpb vmmcall bmi1 arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold
polarker commented 2 years ago

This only works for CUDA GPU, not for AMD GPU

PSLLSP commented 2 years ago

Where do you see AMD GPU?? Check nvidia-smi output, that PC has NVIDIA GPU, it is only GTX-750Ti. There is no AMD GPU in that PC. BTW, I do not think that crash is related to GPU... AMD FX-8300 has no on-chip GPU.

$ gdb ./alephium-0.5.4-cuda-miner-linux 
GNU gdb (Ubuntu 10.1-2ubuntu2) 10.1.90.20210411-git
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./alephium-0.5.4-cuda-miner-linux...

(gdb) run
Starting program: /home/ubuntu/alephium-miner/alephium-0.5.4-cuda-miner-linux 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
2022-02-02 20:41:40 | Running gpu-miner version : release-v0.5.4
[New Thread 0x7ffff6257000 (LWP 185432)]
2022-02-02 20:41:40 | GPU count: 1
2022-02-02 20:41:40 | GPU #0 has #640 cores
2022-02-02 20:41:40 | will connect to broker @127.0.0.1:10973
[New Thread 0x7ffff5a56000 (LWP 185433)]
[New Thread 0x7ffff5255000 (LWP 185434)]

Thread 1 "alephium-0.5.4-" received signal SIGFPE, Arithmetic exception.
0x000055555540b82e in config_cuda(int, int*, int*, bool*) ()

(gdb) layout asm

│   0x55555540b826 <_Z11config_cudaiPiS_Pb+230>     mov    %ebx,%edi                                      │
│   0x55555540b828 <_Z11config_cudaiPiS_Pb+232>     call   0x55555540b360 <_Z16get_device_coresi>         │
│   0x55555540b82d <_Z11config_cudaiPiS_Pb+237>     cltd                                                  │
│  >0x55555540b82e <_Z11config_cudaiPiS_Pb+238>     idivl  0x0(%rbp)                                      │
│   0x55555540b831 <_Z11config_cudaiPiS_Pb+241>     lea    (%rax,%rax,2),%edx                             │
│   0x55555540b834 <_Z11config_cudaiPiS_Pb+244>     mov    %edx,%eax                                      │
│   0x55555540b836 <_Z11config_cudaiPiS_Pb+246>     shr    $0x1f,%eax                                     │
│   0x55555540b839 <_Z11config_cudaiPiS_Pb+249>     add    %edx,%eax                                      │
│   0x55555540b83b <_Z11config_cudaiPiS_Pb+251>     sar    %eax                                           │
│   0x55555540b83d <_Z11config_cudaiPiS_Pb+253>     mov    %eax,(%r12)                                    │
│   0x55555540b841 <_Z11config_cudaiPiS_Pb+257>     jmp    0x55555540b7e3 <_Z11config_cudaiPiS_Pb+163>    │
│   0x55555540b843 <_Z11config_cudaiPiS_Pb+259>     nopl   0x0(%rax,%rax,1)                               │
│   0x55555540b848 <_Z11config_cudaiPiS_Pb+264>     mov    0x28(%rsp),%edx                                │
│   0x55555540b84c <_Z11config_cudaiPiS_Pb+268>     lea    0x2c(%rsp),%rdi                                │
│   0x55555540b851 <_Z11config_cudaiPiS_Pb+273>     mov    $0x27,%esi                                     │
│   0x55555540b856 <_Z11config_cudaiPiS_Pb+278>     call   0x555555460a40 <cudaDeviceGetAttribute>        │
│   0x55555540b85b <_Z11config_cudaiPiS_Pb+283>     test   %eax,%eax                                      │
│   0x55555540b85d <_Z11config_cudaiPiS_Pb+285>     jne    0x55555540b7c5 <_Z11config_cudaiPiS_Pb+133>    

is idivl 0x0(%rbp) the troublemaker? Mayby that crashes because it divides by zero...

polarker commented 2 years ago

Where do you see AMD GPU?? Check nvidia-smi output, that PC has NVIDIA GPU, it is only GTX-750Ti. There is no AMD GPU in that PC. BTW, I do not think that crash is related to GPU... AMD FX-8300 has no on-chip GPU.

$ gdb ./alephium-0.5.4-cuda-miner-linux 
GNU gdb (Ubuntu 10.1-2ubuntu2) 10.1.90.20210411-git
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./alephium-0.5.4-cuda-miner-linux...

(gdb) run
Starting program: /home/ubuntu/alephium-miner/alephium-0.5.4-cuda-miner-linux 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
2022-02-02 20:41:40 | Running gpu-miner version : release-v0.5.4
[New Thread 0x7ffff6257000 (LWP 185432)]
2022-02-02 20:41:40 | GPU count: 1
2022-02-02 20:41:40 | GPU #0 has #640 cores
2022-02-02 20:41:40 | will connect to broker @127.0.0.1:10973
[New Thread 0x7ffff5a56000 (LWP 185433)]
[New Thread 0x7ffff5255000 (LWP 185434)]

Thread 1 "alephium-0.5.4-" received signal SIGFPE, Arithmetic exception.
0x000055555540b82e in config_cuda(int, int*, int*, bool*) ()

(gdb) layout asm

│   0x55555540b826 <_Z11config_cudaiPiS_Pb+230>     mov    %ebx,%edi                                      │
│   0x55555540b828 <_Z11config_cudaiPiS_Pb+232>     call   0x55555540b360 <_Z16get_device_coresi>         │
│   0x55555540b82d <_Z11config_cudaiPiS_Pb+237>     cltd                                                  │
│  >0x55555540b82e <_Z11config_cudaiPiS_Pb+238>     idivl  0x0(%rbp)                                      │
│   0x55555540b831 <_Z11config_cudaiPiS_Pb+241>     lea    (%rax,%rax,2),%edx                             │
│   0x55555540b834 <_Z11config_cudaiPiS_Pb+244>     mov    %edx,%eax                                      │
│   0x55555540b836 <_Z11config_cudaiPiS_Pb+246>     shr    $0x1f,%eax                                     │
│   0x55555540b839 <_Z11config_cudaiPiS_Pb+249>     add    %edx,%eax                                      │
│   0x55555540b83b <_Z11config_cudaiPiS_Pb+251>     sar    %eax                                           │
│   0x55555540b83d <_Z11config_cudaiPiS_Pb+253>     mov    %eax,(%r12)                                    │
│   0x55555540b841 <_Z11config_cudaiPiS_Pb+257>     jmp    0x55555540b7e3 <_Z11config_cudaiPiS_Pb+163>    │
│   0x55555540b843 <_Z11config_cudaiPiS_Pb+259>     nopl   0x0(%rax,%rax,1)                               │
│   0x55555540b848 <_Z11config_cudaiPiS_Pb+264>     mov    0x28(%rsp),%edx                                │
│   0x55555540b84c <_Z11config_cudaiPiS_Pb+268>     lea    0x2c(%rsp),%rdi                                │
│   0x55555540b851 <_Z11config_cudaiPiS_Pb+273>     mov    $0x27,%esi                                     │
│   0x55555540b856 <_Z11config_cudaiPiS_Pb+278>     call   0x555555460a40 <cudaDeviceGetAttribute>        │
│   0x55555540b85b <_Z11config_cudaiPiS_Pb+283>     test   %eax,%eax                                      │
│   0x55555540b85d <_Z11config_cudaiPiS_Pb+285>     jne    0x55555540b7c5 <_Z11config_cudaiPiS_Pb+133>    

is idivl 0x0(%rbp) the troublemaker? Mayby that crashes because it divides by zero...

Could you try Bzminer or T-rex miner instead? I think our miner does not support too old Nvidia GPUs.

PSLLSP commented 2 years ago

Bzminer is one of the worst miners I ever tried. I spent many hours yesterday to start using it but no result at all. It was probably tested with the newest GPUs, newest libraries, etc. It even crashes in my HiveOS... :-( It doesn't work with my hardware, it doesn't see my GPUs.

T-Rex is great miner but the latest public release, t-rex-0.24.8-linux.tar.gz, doesn't support algo blake3. I have found some rumors about beta that supports blake3 but I do not know where to get beta version...

PSLLSP commented 2 years ago

t-rex-0.25.2-linux.tar.gz was released, it support blake3 algo and even dual mining on GPU cards with LHR. T-Rex is great miner, great compatibility, on GTX1070 has higher hashrate than bzminer. GTX750Ti with T-Rex has hashrate similar to GTX1060 with alephium-0.5.4-cuda-miner-linux, about 160 MH/s. GTX1060 with T-Rex has about 400 MH/s. T-Rex doesn't support solo mining in the way of your GPU miner...

PSLLSP commented 2 years ago

SRBMiner 0.9.0 added support for algo blake3_alephium, ALPH can be mined with AMD GPU or CPU.

https://github.com/doktor83/SRBMiner-Multi/releases/tag/0.9.0

BTW, they note that stratum standard for ALPH was not defined and pools are creative and several standards exist...