Open QuentinHK0 opened 7 years ago
ease run clinfo
and post the output.
Thx you for your prompt response clinfo was not installed i just install and run it :
Number of platforms 1
Platform Name AMD Accelerated Parallel Processing
Platform Vendor Advanced Micro Devices, Inc.
Platform Version OpenCL 2.0 AMD-APP (2348.3)
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
Platform Extensions function suffix AMD
Platform Name AMD Accelerated Parallel Processing
Number of devices 2
Device Name Tonga
Device Vendor Advanced Micro Devices, Inc.
Device Vendor ID 0x1002
Device Version OpenCL 1.2 AMD-APP (2348.3)
Driver Version 2348.3
Device OpenCL C Version OpenCL C 1.2
Device Type GPU
Device Profile FULL_PROFILE
Device Board Name (AMD) AMD Radeon (TM) R9 380 Series
Device Topology (AMD) PCI-E, 04:00.0
Max compute units 28
SIMD per compute unit (AMD) 4
SIMD width (AMD) 16
SIMD instruction width (AMD) 1
Max clock frequency 1010MHz
Graphics IP (AMD) 8.0
Device Partition (core)
Max number of sub-devices 28
Supported partition types none specified
Max work item dimensions 3
Max work item sizes 256x256x256
Max work group size 256
Preferred work group size multiple 64
Wavefront width (AMD) 64
Preferred / native vector sizes
char 4 / 4
short 2 / 2
int 1 / 1
long 1 / 1
half 1 / 1 (cl_khr_fp16)
float 1 / 1
double 1 / 1 (cl_khr_fp64)
Half-precision Floating-point support (cl_khr_fp16)
Denormals No
Infinity and NANs No
Round to nearest No
Round to zero No
Round to infinity No
IEEE754-2008 fused multiply-add No
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Single-precision Floating-point support (core)
Denormals No
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations Yes
Double-precision Floating-point support (cl_khr_fp64)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Address bits 64, Little-Endian
Global memory size 4062666752 (3.784GiB)
Global free memory (AMD) 3946200 (3.763GiB)
Global memory channels (AMD) 8
Global memory banks per channel (AMD) 16
Global memory bank width (AMD) 256 bytes
Error Correction support No
Max memory allocation 2887360512 (2.689GiB)
Unified memory for Host and Device No
Minimum alignment for any data type 128 bytes
Alignment of base address 2048 bits (256 bytes)
Global Memory cache type Read/Write
Global Memory cache size 16384
Global Memory cache line 64 bytes
Image support Yes
Max number of samplers per kernel 16
Max size for 1D images from buffer 134217728 pixels
Max 1D or 2D image array size 2048 images
Base address alignment for 2D image buffers 256 bytes
Pitch alignment for 2D image buffers 256 bytes
Max 2D image size 16384x16384 pixels
Max 3D image size 2048x2048x2048 pixels
Max number of read image args 128
Max number of write image args 8
Local memory type Local
Local memory size 32768 (32KiB)
Local memory syze per CU (AMD) 65536 (64KiB)
Local memory banks (AMD) 32
Max constant buffer size 2887360512 (2.689GiB)
Max number of constant args 8
Max size of kernel argument 1024
Queue properties
Out-of-order execution No
Profiling Yes
Prefer user sync for interop Yes
Profiling timer resolution 1ns
Profiling timer offset since Epoch (AMD) 1496906205099678405ns (Thu Jun 8 09:16:45 2017)
Execution capabilities
Run OpenCL kernels Yes
Run native kernels No
Thread trace supported (AMD) Yes
SPIR versions 1.2
printf() buffer size 1048576 (1024KiB)
Built-in kernels
Device Available Yes
Compiler Available Yes
Linker Available Yes
Device Extensions cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event
Device Name Pentium(R) Dual-Core CPU E5500 @ 2.80GHz
Device Vendor GenuineIntel
Device Vendor ID 0x1002
Device Version OpenCL 1.2 AMD-APP (2348.3)
Driver Version 2348.3 (sse2)
Device OpenCL C Version OpenCL C 1.2
Device Type CPU
Device Profile FULL_PROFILE
Device Board Name (AMD)
Device Topology (AMD) (n/a)
Max compute units 2
Max clock frequency 1203MHz
Device Partition (core, cl_ext_device_fission)
Max number of sub-devices 2
Supported partition types equally, by counts, by affinity domain
Supported affinity domains L2 cache, L1 cache, next partitionable
Supported partition types (ext) equally, by counts, by affinity domain
Supported affinity domains (ext) L2 cache, L1 cache, next fissionable
Max work item dimensions 3
Max work item sizes 1024x1024x1024
Max work group size 1024
Preferred work group size multiple 1
Preferred / native vector sizes
char 16 / 16
short 8 / 8
int 4 / 4
long 2 / 2
half 2 / 2 (n/a)
float 4 / 4
double 2 / 2 (cl_khr_fp64)
Half-precision Floating-point support (n/a)
Single-precision Floating-point support (core)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations Yes
Double-precision Floating-point support (cl_khr_fp64)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Address bits 64, Little-Endian
Global memory size 4142968832 (3.858GiB)
Error Correction support No
Max memory allocation 2147483648 (2GiB)
Unified memory for Host and Device Yes
Minimum alignment for any data type 128 bytes
Alignment of base address 1024 bits (128 bytes)
Global Memory cache type Read/Write
Global Memory cache size 32768
Global Memory cache line 64 bytes
Image support Yes
Max number of samplers per kernel 16
Max size for 1D images from buffer 65536 pixels
Max 1D or 2D image array size 2048 images
Max 2D image size 8192x8192 pixels
Max 3D image size 2048x2048x2048 pixels
Max number of read image args 128
Max number of write image args 64
Local memory type Global
Local memory size 32768 (32KiB)
Max constant buffer size 65536 (64KiB)
Max number of constant args 8
Max size of kernel argument 4096 (4KiB)
Queue properties
Out-of-order execution No
Profiling Yes
Prefer user sync for interop Yes
Profiling timer resolution 1ns
Profiling timer offset since Epoch (AMD) 1496906205099678405ns (Thu Jun 8 09:16:45 2017)
Execution capabilities
Run OpenCL kernels Yes
Run native kernels Yes
SPIR versions 1.2
printf() buffer size 65536 (64KiB)
Built-in kernels
Device Available Yes
Compiler Available Yes
Linker Available Yes
Device Extensions cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_spir cl_khr_gl_event
NULL platform behavior
clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) No platform
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) No platform
clCreateContext(NULL, ...) [default] No platform
clCreateContext(NULL, ...) [other] Success [AMD]
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) No platform
Are these values normal?
NULL platform behavior
clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) No platform
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) No platform
clCreateContext(NULL, ...) [default] No platform
clCreateContext(NULL, ...) [other] Success [AMD]
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) No platform
Could you please start the miner by hand from a terminal without start.sh because I am not sure if this is a error message from the miner or your script.
I forgot to mention it , but i try to launch directly and the problem is the same :(
here is :
user@MC002:~/Bureau/xmr-stak-amd-master/bin$ ./
opencl/ start.sh xmr-stak-amd
user@MC002:~/Bureau/xmr-stak-amd-master/bin$ ./xmr-stak-amd
[2017-06-08 16:46:40] : Compiling code and initializing GPUs. This will take a while...
[2017-06-08 16:46:40] : Device 0 work size 8 / 256.
-------------------------------------------------------------------
XMR-Stak-AMD mining software, AMD Version.
AMD mining code was written by wolf9466.
Brought to you by fireice_uk under GPLv3.
Configurable dev donation level is set to 1.0 %
You can use following keys to display reports:
'h' - hashrate
'r' - results
'c' - connection
-------------------------------------------------------------------
[2017-06-08 16:46:47] : Starting GPU thread, no affinity.
[2017-06-08 16:46:47] : Connecting to pool xmr.crypto-pool.fr:8888 ...
[2017-06-08 16:46:47] : Connected. Logging in...
[2017-06-08 16:46:47] : Difficulty changed. Now: 30000.
[2017-06-08 16:46:47] : New block detected.
HASHRATE REPORT
| ID | 10s | 60s | 15m |
| 0 | 530.2 | (na) | (na) |
---------------------------
Totals: 530.2 (na) (na) H/s
Highest: 530.8 H/s
[2017-06-08 16:47:51] : New block detected.
Instruction non permise (core dumped)
user@MC002:~/Bureau/xmr-stak-amd-master/bin$
I have an "illegal instruction (core dumped)" error message
@QuentinHK0 Are you familiar with linux core dumps? If so can you please open the generated file in gdb like so:
gdb xmr-stak-amd <core-file>
When it loads issue following commands bt
, list
, and info locals
and copy and paste the results.
Also please copy and paste the result of the command cat /proc/cpuinfo
@QuentinHK0 What is the type of your gpu. Please change the language of your system to english else the error massages are hard to read.
@fireice-uk Could it be that there is an issue with the aes-ni detection and the miner crashes after the first share is found.
@fireice-uk I can not locate the core-file , I read that it had to be generated in the current directory but there is nothing. i will make some research
@psychocrypt I try to change the language but the error message is still in french, for information the equivalent of "Instruction non permise" is illegal instruction.
@fireice-uk i just read that the ubuntu kernel is configured to use apport to log coredumps and it writes core dump to /var/crash/_path_to_the_program_userid.crash, but it will only do so for applications installed from the main ubuntu apt repo. I think there is a way to get it from xmr-stack-amd, I'll look. do you know something about it ?
@psychocrypt That's my thinking exactly, however another option is gcc building in AVX instructions (it will do that in c++ code if march allows), I need a coredump to confirm which one it is.
@QuentinHK0
following commands will configure the kernel to dump into the current directory:
ulimit -c unlimited
echo './core_%e.%p' | sudo tee /proc/sys/kernel/core_pattern
@fireice-uk Thank you, I missed echo './core_%e.%p' | sudo tee /proc/sys/kernel/core_pattern !
So Here is the return of gdb bt :
(gdb) bt
#0 0x0000000000415ad1 in soft_aeskeygenassist ()
#1 0x000000000041cf58 in void cn_explode_scratchpad<2097152ul, true>(long long __vector const*, long long __vector*) ()
#2 0x000000000041e84b in void cryptonight_hash<524288ul, 2097152ul, true, true>(void const*, unsigned long, void*, cryptonight_ctx*) ()
#3 0x0000000000437646 in executor::on_miner_result(unsigned long, job_result&)
()
#4 0x000000000043812e in executor::ex_main() ()
#5 0x00007fcee45cfc80 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6 0x00007fcee536c6ba in start_thread (arg=0x7fceda240700)
at pthread_create.c:333
#7 0x00007fcee403e82d in clone ()
at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
gdb list :
(gdb) list
1 in ../sysdeps/unix/sysv/linux/x86_64/clone.S
(gdb)
and gdb info locales :
(gdb) info locals
No symbol table info available.
(gdb)
@fireice-uk from clinfo we can see that it is an e5500 without aesni and from gdb we see that softaes is used. In the crashed function soft_aeskeygenassist()
everything look fine for me, do you have any ideas?
@QuentinHK0 Please post the output from the cmake command. I need some information about your compiler.
@psychocrypt
user@MC002:~/Téléchargements/xmr-stak-amd-master$ cmake .
-- The C compiler identification is GNU 5.4.0
-- The CXX compiler identification is GNU 5.4.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found OpenSSL: /usr/lib/x86_64-linux-gnu/libssl.so;/usr/lib/x86_64-linux-gnu/libcrypto.so (found version "1.0.2g")
-- Configuring done
-- Generating done
-- Build files have been written to: /home/user/Téléchargements/xmr-stak-amd-master
user@MC002:~/Téléchargements/xmr-stak-amd-master$
@QuentinHK0
Can you post the result of cat /proc/cpuinfo
(so that I know the exact cpu feature set)? and (inside of gdb core dump) layout asm
?
@psychocrypt This is turning into an interesting detective story, unless GCC up-optimised my code to SSE4, there are only plain-old SSE2 instructions there.
In particular I'm thinking of those lines:
uint32_t X1 = _mm_cvtsi128_si32(_mm_shuffle_epi32(key, 0x55));
uint32_t X3 = _mm_cvtsi128_si32(_mm_shuffle_epi32(key, 0xFF));
In SSE4.1 they can be replaced by two PEXTRD instructions, which is what might be tripping up the code here.
@fireice-uk Hello , Sorry I did not have time to reply during the weekend. For information I use exactly the same configuration (mother board, graphic card, cpu) with windows 10 and wolf miner, everything work fine, so I do not think it comes from cpu features.
user@MC002:~$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Pentium(R) Dual-Core CPU E5500 @ 2.80GHz
stepping : 10
microcode : 0xa07
cpu MHz : 1203.000
cache size : 2048 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm xsave lahf_lm tpr_shadow vnmi flexpriority dtherm
bugs :
bogomips : 5585.86
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:
processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Pentium(R) Dual-Core CPU E5500 @ 2.80GHz
stepping : 10
microcode : 0xa07
cpu MHz : 1203.000
cache size : 2048 KB
physical id : 0
siblings : 2
core id : 1
cpu cores : 2
apicid : 1
initial apicid : 1
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm xsave lahf_lm tpr_shadow vnmi flexpriority dtherm
bugs :
bogomips : 5585.86
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:
how can i send you the result of layout asm command ? When I run it I have a second window in the terminal.
@QuentinHK0 can you make a screenshot? That would be most useful as context is sometimes relevant and this way you will get the nearby instructions too. The problem here (if my hunch is correct), is that gcc is over-optimising the code, translating two SSE2 instructions into one SSE4.1 instruction that your cpu can't handle.
@fireice-uk we compiling with -O3
which is anabling auto vectorization by using -ftree-vectorize
we can disable it for gcc with -fno-tree-vectorize
. I will provide a pull request to test it in one to two hours.
@QuentinHK0 Could you please try the fix in #47
mkdir xmkr-stak-test
cd xmkr-stak-test
git clone https://github.com/psychocrypt/xmr-stak-amd.git .
git checkout topic-noVectorization
cmake .
make -j install
# add your pool to the file config.txt
cd bin
./xmr-miner-amd
@fireice-uk This is the screenshot for layout asm command :
(https://img4.hostingpics.net/pics/283288sreenshotxmrstakamdgdblayoutasm.png)
@psychocrypt I follow your instructions :
user@MC002:~/xmr-stak-test$ cmake -fno--tree-vectorize .
-- The C compiler identification is GNU 5.4.0
-- The CXX compiler identification is GNU 5.4.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found OpenSSL: /usr/lib/x86_64-linux-gnu/libssl.so;/usr/lib/x86_64-linux-gnu/libcrypto.so (found version "1.0.2g")
-- Configuring done
-- Generating done
-- Build files have been written to: /home/user/xmr-stak-test
but make -j install
return an error so i had to use make
:
user@MC002:~/xmr-stak-test$ make
Scanning dependencies of target xmr-stak-amd
[ 5%] Building C object CMakeFiles/xmr-stak-amd.dir/crypto/c_skein.c.o
[ 11%] Building C object CMakeFiles/xmr-stak-amd.dir/crypto/c_keccak.c.o
[ 16%] Building C object CMakeFiles/xmr-stak-amd.dir/crypto/c_jh.c.o
[ 22%] Building C object CMakeFiles/xmr-stak-amd.dir/crypto/soft_aes.c.o
[ 27%] Building C object CMakeFiles/xmr-stak-amd.dir/crypto/c_blake256.c.o
[ 33%] Building C object CMakeFiles/xmr-stak-amd.dir/crypto/c_groestl.c.o
[ 38%] Building CXX object CMakeFiles/xmr-stak-amd.dir/crypto/cryptonight_common.cpp.o
[ 44%] Building C object CMakeFiles/xmr-stak-amd.dir/amd_gpu/gpu.c.o
[ 50%] Building CXX object CMakeFiles/xmr-stak-amd.dir/minethd.cpp.o
[ 55%] Building CXX object CMakeFiles/xmr-stak-amd.dir/jpsock.cpp.o
[ 61%] Building CXX object CMakeFiles/xmr-stak-amd.dir/console.cpp.o
[ 66%] Building CXX object CMakeFiles/xmr-stak-amd.dir/socket.cpp.o
[ 72%] Building CXX object CMakeFiles/xmr-stak-amd.dir/webdesign.cpp.o
[ 77%] Building CXX object CMakeFiles/xmr-stak-amd.dir/cli-miner.cpp.o
[ 83%] Building CXX object CMakeFiles/xmr-stak-amd.dir/jconf.cpp.o
[ 88%] Building CXX object CMakeFiles/xmr-stak-amd.dir/executor.cpp.o
[ 94%] Building CXX object CMakeFiles/xmr-stak-amd.dir/httpd.cpp.o
[100%] Linking CXX executable bin/xmr-stak-amd
[100%] Built target xmr-stak-amd
i put my config.txt and opencl folder in xmr-stak-test and i try to launch : ./xmr-stak-amd
:
user@MC002:~/xmr-stak-test/bin$ ./xmr-stak-amd
[2017-06-13 11:12:34] : Compiling code and initializing GPUs. This will take a while...
[2017-06-13 11:12:34] : Device 0 work size 8 / 256.
[2017-06-13 11:12:49] : Error CL_INVALID_BUFFER_SIZE when calling clCreateBuffer to create hash scratchpads buffer.
user@MC002:~/xmr-stak-test/bin$
Please run make install or copy the opencl folder from the reposetory to xmr-stak-test/bin. The old opencl folder is not compatible with the dev version. Best would be to start again from cloning and running make install instead of make -j install. If it breaks at any step please post the error to avoid mixing code versions.
@psychocrypt I did not take the old opencl folder i use the one from
git clone https://github.com/psychocrypt/xmr-stak-amd.git
Thats fine. Could you please decrease the intensity in the config to 512
@psychocrypt sry i delete the folder to restart with make install command.
@psychocrypt
user@MC002:~/xmr-stak-test/xmr-stak-amd$ make install
make: *** Aucune règle pour fabriquer la cible « install ». Arrêt.
It basically means : "No rule to build the "install" target. Stop."
thx I will check this error today night on my pc. In this case please copy the opencl folder to bin. And try to start the miner with intensity 512.
ok so i use make instead of make install ?
yes
@psychocrypt ok i tested and i have the same error with compiling option -fno--tree-vectorize
[2017-06-13 13:48:59] : Device 0 work size 8 / 256.
-------------------------------------------------------------------
XMR-Stak-AMD mining software, AMD Version.
AMD mining code was written by wolf9466.
Brought to you by fireice_uk under GPLv3.
Configurable dev donation level is set to 1.0 %
You can use following keys to display reports:
'h' - hashrate
'r' - results
'c' - connection
-------------------------------------------------------------------
[2017-06-13 13:49:07] : Starting GPU thread, no affinity.
[2017-06-13 13:49:07] : Connecting to pool xmr.crypto-pool.fr:8888 ...
[2017-06-13 13:49:07] : Connected. Logging in...
[2017-06-13 13:49:10] : Difficulty changed. Now: 30000.
[2017-06-13 13:49:10] : New block detected.
[2017-06-13 13:49:14] : New block detected.
[2017-06-13 13:49:26] : New block detected.
HASHRATE REPORT
| ID | 10s | 60s | 15m |
| 0 | 386.0 | (na) | (na) |
---------------------------
Totals: 386.0 (na) (na) H/s
Highest: 385.6 H/s
illegal instruction (core dumped)
user@MC002:~/xmr-stak-test/xmr-stak-amd/bin$ ^C
@QuentinHK0 Can you post a screenshot of layout asm
or at least your binary file?
@fireice-uk i send you the link in upper post. When i run layout asm
a secondary windows pop up with a ton of command lines.
@QuentinHK0 Sorry I missed it. As you can see my hunch was correct =) it is crashing on PEXTRD instruction.
Quick fix to this problem is to pass -mno-sse4.1
at both C and CXX lines here https://github.com/fireice-uk/xmr-stak-amd/blob/master/CMakeLists.txt#L99
Interestingly, your CPU should support sse4.1, you can try updating your BIOS too.
@QuentinHK0 One final request - can you please post the output of gcc -### -E - -march=native 2>&1 | sed -r '/cc1/!d;s/(")|(^.* - )|( -mno-[^\ ]+)//g'
-> this will help us to see if we can tweak gcc to handle odd cases like that one?
@fireice-uk Hello Thank you for your answers. Here is the result :
-march=core2 -mmmx -msse -msse2 -msse3 -mssse3 -mcx16 -msahf -mfxsr --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=2048 -mtune=core2 -fstack-protector-strong -Wformat -Wformat-security
@fireice-uk Do I need to replace or add -mno-sse4.1
option at lines 98 and 99 ?
@fireice-uk So i recompiled with arg -mno-sse4.1
, cmake .
make install
i put my old config.txt in bin folder and now it's ok, the program runs for 10 min without error. I did not exactly understand where the problem came from but thank's you and GG :)
@psychocrypt do you think an option of passing -march native
might be advisable? His gcc is detecting the odd feature set correctly.
@fireice-uk Sorry to correct you with my first comment in this project, but the feature set isn't correctly detected by gcc. On the Pentium E5500 the SSE4.1 instructions are disabled via the hardware/microcode and as the values via cat /proc/cpuinfo correctly showed in https://github.com/fireice-uk/xmr-stak-amd/issues/44#issuecomment-307709046, there is no SSE4.1 flag enabled. Also the intel data sheet for the Pentium E5500 doesn't list any SSE4.1 instruction set. Maximum is SSE3.
So as this is a gcc bug, it's more a question if this option will increase posted/asked issues when everyone could set it via cmake etc. ?
@fireice I think there is no need for an option with -march=native
the user can put it to the environment before cmake is called.
# run this on a clean xmr-stk code
export CXXFLAGS="-march=native -mtune=native"
export CFLAGS="-march=native -mtune=native"
cmake .
make install
@Panzerfather No need to be sorry, please correct me if you see that I'm wrong. Please explain your reasoning in this case though. GCC will can and does generate instructions that the CPU it runs on can't handle (it can even compile ARM code). What this code snippet does gcc -### -E - -march=native 2>&1 | sed -r '/cc1/!d;s/(")|(^.* - )|( -mno-[^\ ]+)//g'
, afaik, is to display the flags that -march=native
enables.
Compare the results of @QuentinHK0 output with my output generated on a haswell core:
-march=core2 -mmmx -msse -msse2 -msse3 -mssse3 -mcx16 -msahf -mfxsr --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=2048 -mtune=core2 -fstack-protector-strong -Wformat -Wformat-security
mine:
-march=haswell -mmmx -msse -msse2 -msse3 -mssse3 -mcx16 -msahf -mmovbe -maes -mpclmul -mpopcnt -mabm -mfma -mbmi -mbmi2 -mavx -mavx2 -msse4.2 -msse4.1 -mlzcnt -mrdrnd -mf16c -mfsgsbase -mfxsr -mxsave -mxsaveopt --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=3072 -mtune=haswell -fstack-protector-strong -Wformat -Wformat-security
Unless my reasoning is wrong here, -march=native
would have disabled the see4.1 flag (which itself was enabled since we need to build with -maes
to compile the code at all)
@fireice-uk you are absolutely right in this case -march=native
will disable the SSE4.1 flag because it looks what kind of flags the cpu can handle and enable them all, whatever this flag is needed or not. There is a good explanation what the -march=native
flag does in gcc here or here. In most cases there shouldn't be anything wrong with this compiler option.
But as this project massively relies on correct SSE instructions I have doubts that this option can also have a negative effect on the performance and/or produces unexpected crashes/results as it will shorten the SSE registers when AVX/AVX2 features are also activated. This is documented in the Gentoo GCC Optimization Guide.
As @QuentinHK0 processor doesn't support AVX/AVX2 there is no problem at all for him to use this flag, neither with performance doubts nor untested results when the SSE registers are shorten for AVX/AVX2 instructions. But this unique for this kind of CPU which is also "out of support service".
I think @psychocrypt has the right solution for users which a) have such kind of problems discussed in this issue, b) want to try out other compiler flags to get the maximum performance out of this application.
Hello I have problems using xmr-stak-amd here is my problem : I'm using 1 amd R9 380 on Ubuntu 16.04 i install driver from http://support.amd.com/en-us/kb-articles/Pages/AMDGPU-PRO-Install.aspx as indicated in documentation.
My starting script :
My config file :
When i run xmr-stack-amd the program starts normally :
but after a few minutes I have this error :
./start.sh : ligne 7 : 3890 Instruction non permise (core dumped) ./xmr-stak-amd
Would anyone have an idea ?