illegal instruction - Githubissues

QuentinHK0 commented 7 years ago

Hello I have problems using xmr-stak-amd here is my problem : I'm using 1 amd R9 380 on Ubuntu 16.04 i install driver from http://support.amd.com/en-us/kb-articles/Pages/AMDGPU-PRO-Install.aspx as indicated in documentation.

My starting script :

#!/bin/bash
export GPU_FORCE_64BIT_PTR=1
export GPU_USE_SYNC_OBJECTS=1
export GPU_MAX_ALLOC_PERCENT=100
export GPU_SINGLE_ALLOC_PERCENT=100
export GPU_MAX_HEAP_SIZE=100
./xmr-stak-amd

My config file :


"gpu_thread_num" : 1,

"gpu_threads_conf" : [ 
    { "index" : 0, "intensity" : 896, "worksize" : 8, "affine_to_cpu" : false },

],

"platform_index" : 0,
"use_tls" : false,
"tls_secure_algo" : true,
"tls_fingerprint" : "",
"pool_address" : "xmr.crypto-pool.fr:8888",
"wallet_address" : "49r4tSpDTgk2S5GnyM15nCNivRkimwy4KBVdYmY9Wk8dV7b3n9bioxQVGv7Mc6x8ZEbVfoVn71QmuaYNMNTEyXQZ246phFX",
"pool_password" : "x",
"output_file" : "",
"call_timeout" : 10,
"retry_time" : 10,
"giveup_limit" : 0,
"verbose_level" : 4,
"h_print_time" : 60,
"httpd_port" : 0,
"prefer_ipv4" : true,

When i run xmr-stack-amd the program starts normally :


[2017-06-08 12:29:38] : Starting GPU thread, no affinity.
[2017-06-08 12:29:38] : Connecting to pool xmr.crypto-pool.fr:8888 ...
[2017-06-08 12:29:38] : Connected. Logging in...
[2017-06-08 12:29:38] : Difficulty changed. Now: 30000.
[2017-06-08 12:29:38] : New block detected.

but after a few minutes I have this error :

./start.sh : ligne 7 : 3890 Instruction non permise (core dumped) ./xmr-stak-amd

Would anyone have an idea ?

psychocrypt commented 7 years ago

ease run clinfo and post the output.

QuentinHK0 commented 7 years ago

Thx you for your prompt response clinfo was not installed i just install and run it :


Number of platforms                               1
  Platform Name                                   AMD Accelerated Parallel Processing
  Platform Vendor                                 Advanced Micro Devices, Inc.
  Platform Version                                OpenCL 2.0 AMD-APP (2348.3)
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd cl_amd_event_callback cl_amd_offline_devices 
  Platform Extensions function suffix             AMD

  Platform Name                                   AMD Accelerated Parallel Processing
Number of devices                                 2
  Device Name                                     Tonga
  Device Vendor                                   Advanced Micro Devices, Inc.
  Device Vendor ID                                0x1002
  Device Version                                  OpenCL 1.2 AMD-APP (2348.3)
  Driver Version                                  2348.3
  Device OpenCL C Version                         OpenCL C 1.2 
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Device Board Name (AMD)                         AMD Radeon (TM) R9 380 Series
  Device Topology (AMD)                           PCI-E, 04:00.0
  Max compute units                               28
  SIMD per compute unit (AMD)                     4
  SIMD width (AMD)                                16
  SIMD instruction width (AMD)                    1
  Max clock frequency                             1010MHz
  Graphics IP (AMD)                               8.0
  Device Partition                                (core)
    Max number of sub-devices                     28
    Supported partition types                     none specified
  Max work item dimensions                        3
  Max work item sizes                             256x256x256
  Max work group size                             256
  Preferred work group size multiple              64
  Wavefront width (AMD)                           64
  Preferred / native vector sizes                 
    char                                                 4 / 4       
    short                                                2 / 2       
    int                                                  1 / 1       
    long                                                 1 / 1       
    half                                                 1 / 1        (cl_khr_fp16)
    float                                                1 / 1       
    double                                               1 / 1        (cl_khr_fp64)
  Half-precision Floating-point support           (cl_khr_fp16)
    Denormals                                     No
    Infinity and NANs                             No
    Round to nearest                              No
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Single-precision Floating-point support         (core)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Address bits                                    64, Little-Endian
  Global memory size                              4062666752 (3.784GiB)
  Global free memory (AMD)                        3946200 (3.763GiB)
  Global memory channels (AMD)                    8
  Global memory banks per channel (AMD)           16
  Global memory bank width (AMD)                  256 bytes
  Error Correction support                        No
  Max memory allocation                           2887360512 (2.689GiB)
  Unified memory for Host and Device              No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       2048 bits (256 bytes)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        16384
  Global Memory cache line                        64 bytes
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max size for 1D images from buffer            134217728 pixels
    Max 1D or 2D image array size                 2048 images
    Base address alignment for 2D image buffers   256 bytes
    Pitch alignment for 2D image buffers          256 bytes
    Max 2D image size                             16384x16384 pixels
    Max 3D image size                             2048x2048x2048 pixels
    Max number of read image args                 128
    Max number of write image args                8
  Local memory type                               Local
  Local memory size                               32768 (32KiB)
  Local memory syze per CU (AMD)                  65536 (64KiB)
  Local memory banks (AMD)                        32
  Max constant buffer size                        2887360512 (2.689GiB)
  Max number of constant args                     8
  Max size of kernel argument                     1024
  Queue properties                                
    Out-of-order execution                        No
    Profiling                                     Yes
  Prefer user sync for interop                    Yes
  Profiling timer resolution                      1ns
  Profiling timer offset since Epoch (AMD)        1496906205099678405ns (Thu Jun  8 09:16:45 2017)
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    Thread trace supported (AMD)                  Yes
    SPIR versions                                 1.2
  printf() buffer size                            1048576 (1024KiB)
  Built-in kernels                                
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Device Extensions                               cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event 

  Device Name                                     Pentium(R) Dual-Core  CPU      E5500  @ 2.80GHz
  Device Vendor                                   GenuineIntel
  Device Vendor ID                                0x1002
  Device Version                                  OpenCL 1.2 AMD-APP (2348.3)
  Driver Version                                  2348.3 (sse2)
  Device OpenCL C Version                         OpenCL C 1.2 
  Device Type                                     CPU
  Device Profile                                  FULL_PROFILE
  Device Board Name (AMD)                         
  Device Topology (AMD)                           (n/a)
  Max compute units                               2
  Max clock frequency                             1203MHz
  Device Partition                                (core, cl_ext_device_fission)
    Max number of sub-devices                     2
    Supported partition types                     equally, by counts, by affinity domain
    Supported affinity domains                    L2 cache, L1 cache, next partitionable
    Supported partition types (ext)               equally, by counts, by affinity domain
    Supported affinity domains (ext)              L2 cache, L1 cache, next fissionable
  Max work item dimensions                        3
  Max work item sizes                             1024x1024x1024
  Max work group size                             1024
  Preferred work group size multiple              1
  Preferred / native vector sizes                 
    char                                                16 / 16      
    short                                                8 / 8       
    int                                                  4 / 4       
    long                                                 2 / 2       
    half                                                 2 / 2        (n/a)
    float                                                4 / 4       
    double                                               2 / 2        (cl_khr_fp64)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Address bits                                    64, Little-Endian
  Global memory size                              4142968832 (3.858GiB)
  Error Correction support                        No
  Max memory allocation                           2147483648 (2GiB)
  Unified memory for Host and Device              Yes
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       1024 bits (128 bytes)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        32768
  Global Memory cache line                        64 bytes
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max size for 1D images from buffer            65536 pixels
    Max 1D or 2D image array size                 2048 images
    Max 2D image size                             8192x8192 pixels
    Max 3D image size                             2048x2048x2048 pixels
    Max number of read image args                 128
    Max number of write image args                64
  Local memory type                               Global
  Local memory size                               32768 (32KiB)
  Max constant buffer size                        65536 (64KiB)
  Max number of constant args                     8
  Max size of kernel argument                     4096 (4KiB)
  Queue properties                                
    Out-of-order execution                        No
    Profiling                                     Yes
  Prefer user sync for interop                    Yes
  Profiling timer resolution                      1ns
  Profiling timer offset since Epoch (AMD)        1496906205099678405ns (Thu Jun  8 09:16:45 2017)
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            Yes
    SPIR versions                                 1.2
  printf() buffer size                            65536 (64KiB)
  Built-in kernels                                
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Device Extensions                               cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_spir cl_khr_gl_event 

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  No platform
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   No platform
  clCreateContext(NULL, ...) [default]            No platform
  clCreateContext(NULL, ...) [other]              Success [AMD]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  No platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  No platform

QuentinHK0 commented 7 years ago

Are these values normal?


NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  No platform
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   No platform
  clCreateContext(NULL, ...) [default]            No platform
  clCreateContext(NULL, ...) [other]              Success [AMD]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  No platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  No platform

psychocrypt commented 7 years ago

Could you please start the miner by hand from a terminal without start.sh because I am not sure if this is a error message from the miner or your script.

QuentinHK0 commented 7 years ago

I forgot to mention it , but i try to launch directly and the problem is the same :(

QuentinHK0 commented 7 years ago

here is :

user@MC002:~/Bureau/xmr-stak-amd-master/bin$ ./
opencl/       start.sh      xmr-stak-amd  
user@MC002:~/Bureau/xmr-stak-amd-master/bin$ ./xmr-stak-amd 
[2017-06-08 16:46:40] : Compiling code and initializing GPUs. This will take a while...
[2017-06-08 16:46:40] : Device 0 work size 8 / 256.
-------------------------------------------------------------------
XMR-Stak-AMD mining software, AMD Version.
AMD mining code was written by wolf9466.
Brought to you by fireice_uk under GPLv3.

Configurable dev donation level is set to 1.0 %

You can use following keys to display reports:
'h' - hashrate
'r' - results
'c' - connection
-------------------------------------------------------------------
[2017-06-08 16:46:47] : Starting GPU thread, no affinity.
[2017-06-08 16:46:47] : Connecting to pool xmr.crypto-pool.fr:8888 ...
[2017-06-08 16:46:47] : Connected. Logging in...
[2017-06-08 16:46:47] : Difficulty changed. Now: 30000.
[2017-06-08 16:46:47] : New block detected.
HASHRATE REPORT
| ID |   10s |   60s |   15m |
|  0 | 530.2 |  (na) |  (na) |
---------------------------
Totals:   530.2  (na)  (na) H/s
Highest:  530.8 H/s
[2017-06-08 16:47:51] : New block detected.
Instruction non permise (core dumped)
user@MC002:~/Bureau/xmr-stak-amd-master/bin$

I have an "illegal instruction (core dumped)" error message

fireice-uk commented 7 years ago

@QuentinHK0 Are you familiar with linux core dumps? If so can you please open the generated file in gdb like so:

gdb xmr-stak-amd <core-file>

When it loads issue following commands bt, list, and info locals and copy and paste the results.

Also please copy and paste the result of the command cat /proc/cpuinfo

psychocrypt commented 7 years ago

@QuentinHK0 What is the type of your gpu. Please change the language of your system to english else the error massages are hard to read.

@fireice-uk Could it be that there is an issue with the aes-ni detection and the miner crashes after the first share is found.

QuentinHK0 commented 7 years ago

@fireice-uk I can not locate the core-file , I read that it had to be generated in the current directory but there is nothing. i will make some research

@psychocrypt I try to change the language but the error message is still in french, for information the equivalent of "Instruction non permise" is illegal instruction.

QuentinHK0 commented 7 years ago

@fireice-uk i just read that the ubuntu kernel is configured to use apport to log coredumps and it writes core dump to /var/crash/_path_to_the_program_userid.crash, but it will only do so for applications installed from the main ubuntu apt repo. I think there is a way to get it from xmr-stack-amd, I'll look. do you know something about it ?

fireice-uk commented 7 years ago

@psychocrypt That's my thinking exactly, however another option is gcc building in AVX instructions (it will do that in c++ code if march allows), I need a coredump to confirm which one it is.

@QuentinHK0 following commands will configure the kernel to dump into the current directory: ulimit -c unlimited echo './core_%e.%p' | sudo tee /proc/sys/kernel/core_pattern

QuentinHK0 commented 7 years ago

@fireice-uk Thank you, I missed echo './core_%e.%p' | sudo tee /proc/sys/kernel/core_pattern !

So Here is the return of gdb bt :

(gdb) bt
#0  0x0000000000415ad1 in soft_aeskeygenassist ()
#1  0x000000000041cf58 in void cn_explode_scratchpad<2097152ul, true>(long long __vector const*, long long __vector*) ()
#2  0x000000000041e84b in void cryptonight_hash<524288ul, 2097152ul, true, true>(void const*, unsigned long, void*, cryptonight_ctx*) ()
#3  0x0000000000437646 in executor::on_miner_result(unsigned long, job_result&)
    ()
#4  0x000000000043812e in executor::ex_main() ()
#5  0x00007fcee45cfc80 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007fcee536c6ba in start_thread (arg=0x7fceda240700)
    at pthread_create.c:333
#7  0x00007fcee403e82d in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

gdb list :

(gdb) list
1   in ../sysdeps/unix/sysv/linux/x86_64/clone.S
(gdb)

and gdb info locales :

(gdb) info locals
No symbol table info available.
(gdb)

psychocrypt commented 7 years ago

@fireice-uk from clinfo we can see that it is an e5500 without aesni and from gdb we see that softaes is used. In the crashed function soft_aeskeygenassist() everything look fine for me, do you have any ideas?

psychocrypt commented 7 years ago

@QuentinHK0 Please post the output from the cmake command. I need some information about your compiler.

QuentinHK0 commented 7 years ago

@psychocrypt

user@MC002:~/Téléchargements/xmr-stak-amd-master$ cmake .
-- The C compiler identification is GNU 5.4.0
-- The CXX compiler identification is GNU 5.4.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found OpenSSL: /usr/lib/x86_64-linux-gnu/libssl.so;/usr/lib/x86_64-linux-gnu/libcrypto.so (found version "1.0.2g") 
-- Configuring done
-- Generating done
-- Build files have been written to: /home/user/Téléchargements/xmr-stak-amd-master
user@MC002:~/Téléchargements/xmr-stak-amd-master$

fireice-uk commented 7 years ago

@QuentinHK0 Can you post the result of cat /proc/cpuinfo (so that I know the exact cpu feature set)? and (inside of gdb core dump) layout asm ?

@psychocrypt This is turning into an interesting detective story, unless GCC up-optimised my code to SSE4, there are only plain-old SSE2 instructions there.

In particular I'm thinking of those lines:

    uint32_t X1 = _mm_cvtsi128_si32(_mm_shuffle_epi32(key, 0x55));
    uint32_t X3 = _mm_cvtsi128_si32(_mm_shuffle_epi32(key, 0xFF));

In SSE4.1 they can be replaced by two PEXTRD instructions, which is what might be tripping up the code here.

QuentinHK0 commented 7 years ago

@fireice-uk Hello , Sorry I did not have time to reply during the weekend. For information I use exactly the same configuration (mother board, graphic card, cpu) with windows 10 and wolf miner, everything work fine, so I do not think it comes from cpu features.

user@MC002:~$ cat /proc/cpuinfo
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model       : 23
model name  : Pentium(R) Dual-Core  CPU      E5500  @ 2.80GHz
stepping    : 10
microcode   : 0xa07
cpu MHz     : 1203.000
cache size  : 2048 KB
physical id : 0
siblings    : 2
core id     : 0
cpu cores   : 2
apicid      : 0
initial apicid  : 0
fpu     : yes
fpu_exception   : yes
cpuid level : 13
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm xsave lahf_lm tpr_shadow vnmi flexpriority dtherm
bugs        :
bogomips    : 5585.86
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor   : 1
vendor_id   : GenuineIntel
cpu family  : 6
model       : 23
model name  : Pentium(R) Dual-Core  CPU      E5500  @ 2.80GHz
stepping    : 10
microcode   : 0xa07
cpu MHz     : 1203.000
cache size  : 2048 KB
physical id : 0
siblings    : 2
core id     : 1
cpu cores   : 2
apicid      : 1
initial apicid  : 1
fpu     : yes
fpu_exception   : yes
cpuid level : 13
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm xsave lahf_lm tpr_shadow vnmi flexpriority dtherm
bugs        :
bogomips    : 5585.86
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

how can i send you the result of layout asm command ? When I run it I have a second window in the terminal.

fireice-uk commented 7 years ago

@QuentinHK0 can you make a screenshot? That would be most useful as context is sometimes relevant and this way you will get the nearby instructions too. The problem here (if my hunch is correct), is that gcc is over-optimising the code, translating two SSE2 instructions into one SSE4.1 instruction that your cpu can't handle.

psychocrypt commented 7 years ago

@fireice-uk we compiling with -O3 which is anabling auto vectorization by using -ftree-vectorize we can disable it for gcc with -fno-tree-vectorize. I will provide a pull request to test it in one to two hours.

psychocrypt commented 7 years ago

@QuentinHK0 Could you please try the fix in #47

mkdir xmkr-stak-test
cd xmkr-stak-test
git clone https://github.com/psychocrypt/xmr-stak-amd.git . 
git checkout topic-noVectorization
cmake .
make -j install
# add your pool to the file config.txt
cd bin
./xmr-miner-amd

QuentinHK0 commented 7 years ago

@fireice-uk This is the screenshot for layout asm command :

(https://img4.hostingpics.net/pics/283288sreenshotxmrstakamdgdblayoutasm.png)

QuentinHK0 commented 7 years ago

@psychocrypt I follow your instructions :

user@MC002:~/xmr-stak-test$ cmake -fno--tree-vectorize .
-- The C compiler identification is GNU 5.4.0
-- The CXX compiler identification is GNU 5.4.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found OpenSSL: /usr/lib/x86_64-linux-gnu/libssl.so;/usr/lib/x86_64-linux-gnu/libcrypto.so (found version "1.0.2g") 
-- Configuring done
-- Generating done
-- Build files have been written to: /home/user/xmr-stak-test

but make -j install return an error so i had to use make :


user@MC002:~/xmr-stak-test$ make
Scanning dependencies of target xmr-stak-amd
[  5%] Building C object CMakeFiles/xmr-stak-amd.dir/crypto/c_skein.c.o
[ 11%] Building C object CMakeFiles/xmr-stak-amd.dir/crypto/c_keccak.c.o
[ 16%] Building C object CMakeFiles/xmr-stak-amd.dir/crypto/c_jh.c.o
[ 22%] Building C object CMakeFiles/xmr-stak-amd.dir/crypto/soft_aes.c.o
[ 27%] Building C object CMakeFiles/xmr-stak-amd.dir/crypto/c_blake256.c.o
[ 33%] Building C object CMakeFiles/xmr-stak-amd.dir/crypto/c_groestl.c.o
[ 38%] Building CXX object CMakeFiles/xmr-stak-amd.dir/crypto/cryptonight_common.cpp.o
[ 44%] Building C object CMakeFiles/xmr-stak-amd.dir/amd_gpu/gpu.c.o
[ 50%] Building CXX object CMakeFiles/xmr-stak-amd.dir/minethd.cpp.o
[ 55%] Building CXX object CMakeFiles/xmr-stak-amd.dir/jpsock.cpp.o
[ 61%] Building CXX object CMakeFiles/xmr-stak-amd.dir/console.cpp.o
[ 66%] Building CXX object CMakeFiles/xmr-stak-amd.dir/socket.cpp.o
[ 72%] Building CXX object CMakeFiles/xmr-stak-amd.dir/webdesign.cpp.o
[ 77%] Building CXX object CMakeFiles/xmr-stak-amd.dir/cli-miner.cpp.o
[ 83%] Building CXX object CMakeFiles/xmr-stak-amd.dir/jconf.cpp.o
[ 88%] Building CXX object CMakeFiles/xmr-stak-amd.dir/executor.cpp.o
[ 94%] Building CXX object CMakeFiles/xmr-stak-amd.dir/httpd.cpp.o
[100%] Linking CXX executable bin/xmr-stak-amd
[100%] Built target xmr-stak-amd

i put my config.txt and opencl folder in xmr-stak-test and i try to launch : ./xmr-stak-amd :

user@MC002:~/xmr-stak-test/bin$ ./xmr-stak-amd 
[2017-06-13 11:12:34] : Compiling code and initializing GPUs. This will take a while...
[2017-06-13 11:12:34] : Device 0 work size 8 / 256.
[2017-06-13 11:12:49] : Error CL_INVALID_BUFFER_SIZE when calling clCreateBuffer to create hash scratchpads buffer.
user@MC002:~/xmr-stak-test/bin$

psychocrypt commented 7 years ago

Please run make install or copy the opencl folder from the reposetory to xmr-stak-test/bin. The old opencl folder is not compatible with the dev version. Best would be to start again from cloning and running make install instead of make -j install. If it breaks at any step please post the error to avoid mixing code versions.

QuentinHK0 commented 7 years ago

@psychocrypt I did not take the old opencl folder i use the one from

git clone https://github.com/psychocrypt/xmr-stak-amd.git

psychocrypt commented 7 years ago

Thats fine. Could you please decrease the intensity in the config to 512

QuentinHK0 commented 7 years ago

@psychocrypt sry i delete the folder to restart with make install command.

QuentinHK0 commented 7 years ago

@psychocrypt

user@MC002:~/xmr-stak-test/xmr-stak-amd$ make install
make: *** Aucune règle pour fabriquer la cible « install ». Arrêt.

It basically means : "No rule to build the "install" target. Stop."

psychocrypt commented 7 years ago

thx I will check this error today night on my pc. In this case please copy the opencl folder to bin. And try to start the miner with intensity 512.

QuentinHK0 commented 7 years ago

ok so i use make instead of make install ?

psychocrypt commented 7 years ago

yes

QuentinHK0 commented 7 years ago

@psychocrypt ok i tested and i have the same error with compiling option -fno--tree-vectorize

[2017-06-13 13:48:59] : Device 0 work size 8 / 256.
-------------------------------------------------------------------
XMR-Stak-AMD mining software, AMD Version.
AMD mining code was written by wolf9466.
Brought to you by fireice_uk under GPLv3.

Configurable dev donation level is set to 1.0 %

You can use following keys to display reports:
'h' - hashrate
'r' - results
'c' - connection
-------------------------------------------------------------------
[2017-06-13 13:49:07] : Starting GPU thread, no affinity.
[2017-06-13 13:49:07] : Connecting to pool xmr.crypto-pool.fr:8888 ...
[2017-06-13 13:49:07] : Connected. Logging in...
[2017-06-13 13:49:10] : Difficulty changed. Now: 30000.
[2017-06-13 13:49:10] : New block detected.
[2017-06-13 13:49:14] : New block detected.
[2017-06-13 13:49:26] : New block detected.
HASHRATE REPORT
| ID |   10s |   60s |   15m |
|  0 | 386.0 |  (na) |  (na) |
---------------------------
Totals:   386.0  (na)  (na) H/s
Highest:  385.6 H/s
illegal instruction (core dumped)
user@MC002:~/xmr-stak-test/xmr-stak-amd/bin$ ^C

fireice-uk commented 7 years ago

@QuentinHK0 Can you post a screenshot of layout asm or at least your binary file?

QuentinHK0 commented 7 years ago

@fireice-uk i send you the link in upper post. When i run layout asm a secondary windows pop up with a ton of command lines.

fireice-uk commented 7 years ago

@QuentinHK0 Sorry I missed it. As you can see my hunch was correct =) it is crashing on PEXTRD instruction.

Quick fix to this problem is to pass -mno-sse4.1 at both C and CXX lines here https://github.com/fireice-uk/xmr-stak-amd/blob/master/CMakeLists.txt#L99

Interestingly, your CPU should support sse4.1, you can try updating your BIOS too.

fireice-uk commented 7 years ago

@QuentinHK0 One final request - can you please post the output of gcc -### -E - -march=native 2>&1 | sed -r '/cc1/!d;s/(")|(^.* - )|( -mno-[^\ ]+)//g' -> this will help us to see if we can tweak gcc to handle odd cases like that one?

QuentinHK0 commented 7 years ago

@fireice-uk Hello Thank you for your answers. Here is the result :

-march=core2 -mmmx -msse -msse2 -msse3 -mssse3 -mcx16 -msahf -mfxsr --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=2048 -mtune=core2 -fstack-protector-strong -Wformat -Wformat-security

QuentinHK0 commented 7 years ago

@fireice-uk Do I need to replace or add -mno-sse4.1 option at lines 98 and 99 ?

QuentinHK0 commented 7 years ago

@fireice-uk So i recompiled with arg -mno-sse4.1 , cmake . make install i put my old config.txt in bin folder and now it's ok, the program runs for 10 min without error. I did not exactly understand where the problem came from but thank's you and GG :)

fireice-uk commented 7 years ago

@psychocrypt do you think an option of passing -march native might be advisable? His gcc is detecting the odd feature set correctly.

Panzerfather commented 7 years ago

@fireice-uk Sorry to correct you with my first comment in this project, but the feature set isn't correctly detected by gcc. On the Pentium E5500 the SSE4.1 instructions are disabled via the hardware/microcode and as the values via cat /proc/cpuinfo correctly showed in https://github.com/fireice-uk/xmr-stak-amd/issues/44#issuecomment-307709046, there is no SSE4.1 flag enabled. Also the intel data sheet for the Pentium E5500 doesn't list any SSE4.1 instruction set. Maximum is SSE3.

So as this is a gcc bug, it's more a question if this option will increase posted/asked issues when everyone could set it via cmake etc. ?

psychocrypt commented 7 years ago

@fireice I think there is no need for an option with -march=native the user can put it to the environment before cmake is called.

# run this on a clean xmr-stk code
export CXXFLAGS="-march=native -mtune=native"
export CFLAGS="-march=native -mtune=native"
cmake .
make install

fireice-uk commented 7 years ago

@Panzerfather No need to be sorry, please correct me if you see that I'm wrong. Please explain your reasoning in this case though. GCC will can and does generate instructions that the CPU it runs on can't handle (it can even compile ARM code). What this code snippet does gcc -### -E - -march=native 2>&1 | sed -r '/cc1/!d;s/(")|(^.* - )|( -mno-[^\ ]+)//g', afaik, is to display the flags that -march=native enables.

Compare the results of @QuentinHK0 output with my output generated on a haswell core:

-march=core2 -mmmx -msse -msse2 -msse3 -mssse3 -mcx16 -msahf -mfxsr --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=2048 -mtune=core2 -fstack-protector-strong -Wformat -Wformat-security

mine:

-march=haswell -mmmx -msse -msse2 -msse3 -mssse3 -mcx16 -msahf -mmovbe -maes -mpclmul -mpopcnt -mabm -mfma -mbmi -mbmi2 -mavx -mavx2 -msse4.2 -msse4.1 -mlzcnt -mrdrnd -mf16c -mfsgsbase -mfxsr -mxsave -mxsaveopt --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=3072 -mtune=haswell -fstack-protector-strong -Wformat -Wformat-security

Unless my reasoning is wrong here, -march=native would have disabled the see4.1 flag (which itself was enabled since we need to build with -maes to compile the code at all)

Panzerfather commented 7 years ago

@fireice-uk you are absolutely right in this case -march=native will disable the SSE4.1 flag because it looks what kind of flags the cpu can handle and enable them all, whatever this flag is needed or not. There is a good explanation what the -march=native flag does in gcc here or here. In most cases there shouldn't be anything wrong with this compiler option.

But as this project massively relies on correct SSE instructions I have doubts that this option can also have a negative effect on the performance and/or produces unexpected crashes/results as it will shorten the SSE registers when AVX/AVX2 features are also activated. This is documented in the Gentoo GCC Optimization Guide.

As @QuentinHK0 processor doesn't support AVX/AVX2 there is no problem at all for him to use this flag, neither with performance doubts nor untested results when the SSE registers are shorten for AVX/AVX2 instructions. But this unique for this kind of CPU which is also "out of support service".

I think @psychocrypt has the right solution for users which a) have such kind of problems discussed in this issue, b) want to try out other compiler flags to get the maximum performance out of this application.

fireice-uk / xmr-stak-amd

illegal instruction #44