fireice-uk / xmr-stak

Free Monero RandomX Miner and unified CryptoNight miner
GNU General Public License v3.0
4.05k stars 1.8k forks source link

test i7-4870HQ OpenCl #326

Open psychocrypt opened 6 years ago

psychocrypt commented 6 years ago

This issue can be used as base for discussion if it is better to utilize the L4 of the i7 4870HQ with the cpu or gpu. There is n open PR #168 where 5 hashes per thread can be increase performance.

I created a branch where the INTEL gpu can be used with OpenCl: Branch: https://github.com/psychocrypt/xmr-stak/tree/topic-intelOpenCL direkt download of the source code: https://github.com/psychocrypt/xmr-stak/archive/topic-intelOpenCL.zip

Please compile the miner with OpenCl support and report

CC-ing: @grzegorzszczecin @JoKeRz42o

JoKeRz42o commented 6 years ago

@psychocrypt

  1. I apologize about the mixup with the CPU model... the macbook pro i was referring to earlier is not the i7-4870HQ but in fact the i7-4980HQ with the Discreet AMD R9 M370X GPU.

  2. One of the other Macbooks is a Mid-2015 (IG) 11,4 with an I7-4770HQ... with this work on that?

  3. I actually started using xmr-stak-amd initially... then added the xmr-stak-cpu.

When the AIO miner was released, I compiled with the AMD OpenCL backend included and if memory serves me correctly, hashrate was well below (350ish) what i'm getting now.

I'll be more than happy to give the Intel OpenCL another go with your recommended adjustments...

I'll be honest, I'm fairly new to compiling on linux/mac based OS... especially from git repos... Earlier I tried to compile a second/updated version on one of my macbooks and for some reason it screwed up the initial build which was in a completely different folder.

I've been reading up on the git commands including the fetch, clone and checkout option to switch between different branches (dev/master etc) but I must be missing something.

IF you don't mind, could you briefly explain and/or give the commands which would allow for multiple builds so I can test/compare different iterations of the program?

I'll get back with the requested results asap. Thanks in advance!

grzegorzszczecin commented 6 years ago

@psychocrypt sorry need some help. I was able to build xmr-stak but encountered warning

ARNING: AMD cannot load backend library: dlopen(libxmrstak_opencl_backend.dylib, 1): image not found
WARNING: AMD Backend disabled
psychocrypt commented 6 years ago

could you please post the output ls -la in the bin folder. It looks like the dynamic library for the amd backend is not build. Pease post also the commands used to compile the miner and the output from cmake.

grzegorzszczecin commented 6 years ago

Never mind I figured it out. Had some issue with my library paths.

For the record, my MBP under test is Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz + NVIDIA GeForce GT 750M 2 GB + Intel Iris Pro 1536 MB

(1) auto suggestion of amd.txt

"gpu_threads_conf" : [
  // gpu: Iris Pro memory:256
  // compute units: 40
  { "index" : 0,
    "intensity" : 0, "worksize" : 8,
    "affine_to_cpu" : false,
  },
  // gpu: GeForce GT 750M memory:384
  // compute units: 2
  { "index" : 1,
    "intensity" : 176, "worksize" : 8,
    "affine_to_cpu" : false,
  },
],

and auto-suggested cpu.txt

"cpu_threads_conf" :
[
    { "low_power_mode" : false, "no_prefetch" : true, "affine_to_cpu" : 0 },
    { "low_power_mode" : false, "no_prefetch" : true, "affine_to_cpu" : 2 },
    { "low_power_mode" : false, "no_prefetch" : true, "affine_to_cpu" : 4 },

],

(2) Hash rate with auto suggestion is 181.3 H/s, but the printout shows that only CPU hashrate is counted.

[2017-12-03 00:00:03] : Error CL_INVALID_BUFFER_SIZE when calling clCreateBuffer to create hash scratchpads buffer.
[2017-12-03 00:00:03] : WARNING: AMD device not found
[2017-12-03 00:00:03] : WARNING: backend AMD disabled.

and

HASHRATE REPORT - CPU
| ID |  10s |  60s |  15m | ID |  10s |  60s |  15m |
|  0 | 60.3 | (na) | (na) |  1 | 60.5 | (na) | (na) |
|  2 | 60.4 | (na) | (na) |
-----------------------------------------------------
Totals:   181.3 (na) (na) H/s

(3) Changed intensity to 48 for the Intel Iris GPU in amd.txt while keeping NVIDIA intensity at 176. This time xmr-stak was able to utilize the Iris GPU

[2017-12-03 00:02:05] : Compiling code and initializing GPUs. This will take a while...
[2017-12-03 00:02:05] : WARNING: using non AMD device: Apple
[2017-12-03 00:02:06] : Device 0 work size 8 / 64.
[2017-12-03 00:02:10] : Device 1 work size 8 / 128.

But the hash rate isn't too much better. It seems "CPU" and "AMD" (really Intel Iris) are competing against each other for cache bandwidth?

HASHRATE REPORT - CPU
| ID |  10s |  60s |  15m | ID |  10s |  60s |  15m |
|  0 | 29.1 | (na) | (na) |  1 | 29.2 | (na) | (na) |
|  2 | 29.2 | (na) | (na) |
-----------------------------------------------------
HASHRATE REPORT - AMD
| ID |  10s |  60s |  15m | ID |  10s |  60s |  15m |
|  0 | 45.4 | (na) | (na) |  1 | 78.6 | (na) | (na) |
-----------------------------------------------------
Totals:   211.4 (na) (na) H/s
psychocrypt commented 6 years ago

thx for the results.

could you please remove the nvidia gpu from amd.txt and start the miner with xmr-stak --noCPU and change the intel gpu setting to intensity 40 and worksize 1

grzegorzszczecin commented 6 years ago

After removing nvidia gpu, the new amd.txt:

"gpu_threads_conf" : [
  // gpu: Iris Pro memory:256
  // compute units: 40
  { "index" : 0,
    "intensity" : 40, "worksize" : 1,
    "affine_to_cpu" : false, 
  },
],

Result of xmr-stak --no-CPU:

HASHRATE REPORT - AMD
| ID |  10s |  60s |  15m |
|  0 | 43.0 | 43.0 | (na) |
---------------------------
Totals:   43.0 43.0 (na) H/s
Highest:  43.3 H/s
psychocrypt commented 6 years ago

Could you please do a last test and set intensity to 24 and worksize to 8. Currently it looks like using the cpu is more effective

grzegorzszczecin commented 6 years ago

For "intensity" : 24, "worksize" : 8, I get

HASHRATE REPORT - AMD
| ID |  10s |  60s |  15m |
|  0 | 26.7 | 26.6 | (na) |
---------------------------
Totals:   26.7 26.6 (na) H/s
Highest:  26.7 H/s
JoKeRz42o commented 6 years ago

@psychocrypt I was able to Build with no errors but once i ran ./xmr-stak I encountered the following issue...

Admins-MacBook-Pro:bin admin$ ./xmr-stak
Please enter:
- Currency: 'monero' or 'aeon'
monero
- Pool address: e.g. pool.usxmrpool.com:3333
us-east.cryptonight-hub.miningpoolhub.com:12024
- Username (wallet address or pool login):
**********
- Password (mostly empty or x):
**********
- Does this pool port support TLS/SSL? Use no if unknown. (y/N)
y
- Do you want to use nicehash on this pool? (y/n)
n
- Do you want to use multiple pools? (y/n)
n
Configuration stored in file 'config.txt'
-------------------------------------------------------------------
xmr-stak 2.0.0 84e4f7f

Brought to you by fireice_uk and psychocrypt under GPLv3.
Based on CPU mining code by wolf9466 (heavily optimized by fireice_uk).
Based on OpenCL mining code by wolf9466.

Configurable dev donation level is set to 2.0%

You can use following keys to display reports:
'h' - hashrate
'r' - results
'c' - connection
-------------------------------------------------------------------
[2017-12-05 05:14:22] : Start mining: MONERO
[2017-12-05 05:14:22] : Found AMD platform index id = 0, name = Apple
[2017-12-05 05:14:22] : Found OpenCL GPU Iris Pro.
[2017-12-05 05:14:22] : Found OpenCL GPU AMD Radeon R9 M370X Compute Engine.
[2017-12-05 05:14:22] : AMD: GPU configuration stored in file 'amd.txt'
[2017-12-05 05:14:22] : Compiling code and initializing GPUs. This will take a while...
[2017-12-05 05:14:22] : WARNING: using non AMD device: Apple
[2017-12-05 05:14:22] : Device 0 work size 8 / 64.
[2017-12-05 05:14:22] : Error CL_INVALID_BUFFER_SIZE when calling clCreateBuffer to create hash scratchpads buffer.
[2017-12-05 05:14:22] : WARNING: AMD device not found
[2017-12-05 05:14:22] : WARNING: backend AMD disabled.
[2017-12-05 05:14:22] : CPU configuration stored in file 'cpu.txt'
[2017-12-05 05:14:22] : WARNING on MacOS thread affinity is only advisory.
[2017-12-05 05:14:22] : Starting single thread, affinity: 0.
[2017-12-05 05:14:22] : hwloc: set_thisthread_membind not supported
[2017-12-05 05:14:22] : WARNING on MacOS thread affinity is only advisory.
[2017-12-05 05:14:22] : Starting single thread, affinity: 2.
[2017-12-05 05:14:22] : hwloc: set_thisthread_membind not supported
[2017-12-05 05:14:22] : WARNING on MacOS thread affinity is only advisory.
[2017-12-05 05:14:22] : Starting single thread, affinity: 4.
[2017-12-05 05:14:22] : hwloc: set_thisthread_membind not supported

It looks like it correctly located the OpenCL devices per

[2017-12-05 05:14:22] : Found AMD platform index id = 0, name = Apple
[2017-12-05 05:14:22] : Found OpenCL GPU Iris Pro.
[2017-12-05 05:14:22] : Found OpenCL GPU AMD Radeon R9 M370X Compute Engine.

...then it seems to encounter some error which disabled both OpenCL devices.

[2017-12-05 05:14:22] : Error CL_INVALID_BUFFER_SIZE when calling clCreateBuffer to create hash scratchpads buffer.
[2017-12-05 05:14:22] : WARNING: AMD device not found
[2017-12-05 05:14:22] : WARNING: backend AMD disabled.
  1. Auto Suggests

    A. Auto suggestion from amd.txt

    "gpu_threads_conf" : [
    // gpu: Iris Pro memory:256
    // compute units: 40
    { "index" : 0,
    "intensity" : 0, "worksize" : 8,
    "affine_to_cpu" : false, 
    },
    // gpu: AMD Radeon R9 M370X Compute Engine memory:384
    // compute units: 10
    { "index" : 1,
    "intensity" : 160, "worksize" : 8,
    "affine_to_cpu" : false, 
    },
    ],

    B. Auto suggestion from cpu.txt

"cpu_threads_conf" :
[
    { "low_power_mode" : false, "no_prefetch" : true, "affine_to_cpu" : 0 },
    { "low_power_mode" : false, "no_prefetch" : true, "affine_to_cpu" : 2 },
    { "low_power_mode" : false, "no_prefetch" : true, "affine_to_cpu" : 4 },

],
  1. Hash rate with auto suggestion
HASHRATE REPORT - CPU
| ID |  10s |  60s |  15m | ID |  10s |  60s |  15m |
|  0 | 71.9 | 69.6 | (na) |  1 | 71.4 | 69.5 | (na) |
|  2 | 71.6 | 69.5 | (na) |
-----------------------------------------------------
Totals:   214.9 208.6 (na) H/s
Highest:  215.2 H/s
  1. Iris Pro Intensity set to 48 per your suggestion
    // gpu: Iris Pro memory:256
    // compute units: 40
    { "index" : 0,
    "intensity" : 48, "worksize" : 8,
    "affine_to_cpu" : false, 
    },

Experienced CONSIDERABLE lag when switching between between Terminal and browser with hash rate as follows...

[2017-12-05 06:26:47] : New block detected.
HASHRATE REPORT - CPU
| ID |  10s |  60s |  15m | ID |  10s |  60s |  15m |
|  0 | 38.2 | 39.2 | (na) |  1 | 38.1 | 39.2 | (na) |
|  2 | 38.2 | 39.2 | (na) |
-----------------------------------------------------
HASHRATE REPORT - AMD
| ID |  10s |  60s |  15m | ID |  10s |  60s |  15m |
|  0 | 38.6 | 38.5 | (na) |  1 | 106.9 | 106.9 | (na) |
-----------------------------------------------------
Totals:   260.0 263.0 (na) H/s
Highest:  321.5 H/s

Tweaking Intensity

"gpu_threads_conf" : [
  // gpu: Iris Pro memory:256
  // compute units: 40
  { "index" : 0, "intensity" : 232, "worksize" : 8, "affine_to_cpu" : false, 
  },
  // gpu: AMD Radeon R9 M370X Compute Engine memory:384
  // compute units: 10
  { "index" : 1, "intensity" : 480, "worksize" : 8, "affine_to_cpu" : false, 
  },
],

...yielded net increase of approximately 20 h/s at the expense of a reduced CPU hash rate.

HASHRATE REPORT - CPU
| ID |  10s |  60s |  15m | ID |  10s |  60s |  15m |
|  0 | 24.6 | 23.6 | (na) |  1 | 24.3 | 23.6 | (na) |
|  2 | 24.3 | 23.5 | (na) |
-----------------------------------------------------
HASHRATE REPORT - AMD
| ID |  10s |  60s |  15m | ID |  10s |  60s |  15m |
|  0 | 71.8 | 71.2 | (na) |  1 | 143.2 | 143.2 | (na) |
-----------------------------------------------------
Totals:   288.2 285.2 (na) H/s
Highest:  302.3 H/s
nsummy commented 6 years ago

I'd like to test this out. I have a machine with a i7-5775R (no other video card) in it mining with xmr-stak at about 612 H/S. I've cloned this branch and cannot get it to work. I get the error stating that there is no opencl device found. This is on headless ubuntu 17.10 server. I installed the amd SDK and after this error tried installing the intel opencl drivers and the radeon drivers. Is there something special I need to do? Or does this require xwindows and a monitor hooked up?

totallyG82 commented 6 years ago

@psychocrypt The intel GPUs have 256KB of cache per slice. They may then use the CPUs L3 or the eDRAM (Iris Pro). After that they use slow system RAM. Unless we see a CryptoNight "nano" with smaller 256KB blocks, most Intel GPUs will be of no use mining CryptoNight.

verych commented 6 years ago

@psychocrypt, maybe you canhelp by advice. I cannot resolve an issue with Intel Iris Pro. See this error message:

[2018-01-24 11:51:46] : Start mining: MONERO [2018-01-24 11:51:46] : Compiling code and initializing GPUs. This will take a while... [2018-01-24 11:51:46] : WARNING: using non AMD device: Intel(R) Corporation [2018-01-24 11:51:46] : Device 0 work size 8 / 64. [2018-01-24 11:51:46] : Error CL_INVALID_DEVICE when calling clCreateCommandQueueWithProperties. [2018-01-24 11:51:46] : WARNING: AMD device not found [2018-01-24 11:51:46] : WARNING: backend AMD disabled. [2018-01-24 11:51:46] : ERROR: No miner backend enabled.

auto-config is: "gpu_threads_conf" : [ // gpu: Intel(R) Iris(TM) Pro Graphics 5200 memory:279 // compute units: 40 { "index" : 0, "intensity" : 0, "worksize" : 8, "affine_to_cpu" : false, }, ],

Simaex commented 6 years ago

Try to set intensity to non-zero value.

verych commented 6 years ago

@Simaex I tried different values (1,24,40,48,100) but it doesn't work

verych commented 6 years ago

anyway, I think my graphic card will die in couple of weeks if I use it at 90-100%, but CPU is OK

JoKeRz42o commented 6 years ago

Just a followup to my previous message.... This is only based on my personal experience over the course of 3-4 months on 5 MacBook Pro's with the Crystalwell (L4 128MB) memory...

You're better off disabling both the Integrated Graphics (IG) AND Discreet Graphics (DG) seeing as it has a negative impact on overall hash rate. With the gpu enabled, my total hash rate tops out at 300-320 H/s.

Whereas with the gpu disabled, it frees up that lightning fast L4 ram to boost the CPU hash rate. Settings are the same on all 5 MacBook Pro. Results vary slightly based on processor speed and workload. Below are results from 2 different units.

Current cpu.txt Settings

"cpu_threads_conf" :
[
    { "low_power_mode" : 5, "no_prefetch" : true, "affine_to_cpu" : 0 },
    { "low_power_mode" : 5, "no_prefetch" : true, "affine_to_cpu" : 1 },
    { "low_power_mode" : 5, "no_prefetch" : true, "affine_to_cpu" : 2 },
    { "low_power_mode" : 5, "no_prefetch" : true, "affine_to_cpu" : 3 },
    { "low_power_mode" : 5, "no_prefetch" : true, "affine_to_cpu" : 4 },
    { "low_power_mode" : 5, "no_prefetch" : true, "affine_to_cpu" : 5 },
    { "low_power_mode" : 5, "no_prefetch" : true, "affine_to_cpu" : 6 },
    { "low_power_mode" : 5, "no_prefetch" : true, "affine_to_cpu" : 7 },
],

Resulting Performance

Mid-2015 (IG) 11,4 with an I7-4770HQ (2.2GHz)

HASHRATE REPORT - CPU
| ID |  10s |  60s |  15m | ID |  10s |  60s |  15m |
|  0 | 65.9 | 69.6 | 70.9 |  1 | 65.3 | 69.6 | 70.4 |
|  2 | 64.3 | 69.2 | 70.3 |  3 | 65.2 | 69.4 | 70.6 |
|  4 | 66.1 | 69.8 | 70.7 |  5 | 65.7 | 69.6 | 70.6 |
|  6 | 64.5 | 69.5 | 70.4 |  7 | 64.5 | 69.3 | 69.9 |
-----------------------------------------------------
Totals:   521.6 556.1 563.8 H/s
Highest:  595.7 H/s

Mid-2015 (DG) with i7-4980HQ (2.8GHz) with the Discreet AMD R9 M370X GPU which does approx 150 H/s but reduces the cpu to about 300H/s.

HASHRATE REPORT - CPU
| ID |  10s |  60s |  15m | ID |  10s |  60s |  15m |
|  0 | 72.6 | 74.1 | 74.2 |  1 | 72.4 | 73.9 | 73.3 |
|  2 | 72.2 | 73.8 | 73.5 |  3 | 72.4 | 73.9 | 73.9 |
|  4 | 72.3 | 73.9 | 73.7 |  5 | 72.6 | 73.0 | 73.2 |
|  6 | 72.8 | 74.2 | 73.5 |  7 | 72.0 | 74.4 | 74.5 |
-----------------------------------------------------
Totals:   579.3 591.2 587.8 H/s
Highest:  651.4 H/s
JoKeRz42o commented 6 years ago

@verych Good point about the GPU... It generates MASSIVE amounts of heat at high workloads and these laptops are not designed to provide enough cooling to maintain high work loads for extended periods of time.

Just to point out, it's not the 90-100% workload that would kill it... it's the 90℃-100℃ temps that would do it in. If there was a way to cool it down below 80℃, that would be a different story.

nsummy commented 6 years ago

@JoKeRz42o Have you gotten better performance by setting the no_prefetch to true? Mine is currently set to false for all threads.

jmichaelbarker commented 6 years ago

@JoKeRz42o Out of curiosity have you done anything else to your configs to get those speeds? I have an identical cpu_threads_conf section and I can't seem to push more than ~370 h/s on the exact same hardware (MacBookPro11,4 - i7-4770hq)

NicksonYap commented 6 years ago

Not sure if these findings are any useful: I'm using i7-7500U, HD Graphics 650 on a laptop (HP Spectre x360) I have a fan turned on right beside the laptop

Turns out it's better not to turn on GPU at all, because "Power Limit Throttling" will turn on (in XTU) CPU & GPU is fighting for resources. It's not worth it, even if setting intensity to 1 and worksize to 1. However, it was fun trying out :)

(sry, couldn't resize) image

image

underworlddemon commented 6 years ago

can someone give exe for windows? my Iris Pro 6200 need test, but I can't build myself :-\

Simaex commented 6 years ago

@underworlddemon just visit https://github.com/fireice-uk/xmr-stak/releases

underworlddemon commented 6 years ago

@Simaex your link can't use intel openCL

underworlddemon commented 5 years ago

No I build myself exe (ver 2.10.4) for Windows and test my Iris Pro 6200

for build force in gpu.cpp

bool isAMDDevice = false; bool isNVIDIADevice = true;

and

bool isAMDOpenCL = false; bool isNVIDIADevice = true;

(or reverse)

all builded by visual studio community 2019 with 14.16 toolset for build openCL binary needed lightOCLSDK.zip extracted to C:\xmr-stak-dep ( it's not in instruction )

amd=true generate amd.txt

"gpu_threads_conf" : [ // gpu: Intel(R) Iris(TM) Pro Graphics 6200 compute units: 48 // memory:145|145|452 MiB (used per thread|max per alloc|total free) { "index" : 0, "intensity" : 0, "worksize" : 8, "affine_to_cpu" : false, "strided_index" : 2, "mem_chunk" : 2, "unroll" : 8, "comp_mode" : true, "interleave" : 40 },

than crash and change "intensity" to 32..48

nvidia=true generate amd.txt

"gpu_threads_conf" : [ // gpu: Intel(R) Iris(TM) Pro Graphics 6200 compute units: 48 // memory:452|580|452 MiB (used per thread|max per alloc|total free) { "index" : 0, "intensity" : 226, "worksize" : 8, "affine_to_cpu" : false, "strided_index" : 3, "mem_chunk" : 4, "unroll" : 8, "comp_mode" : true, "interleave" : 0 },

than crash and change "intensity" to 32..48

hashrate about 39..45 for intensity=48 25-31 for intensity=32

and BIG lagging in gui if intel gpu render desktop

for big skills all was abundantly clear a jear ago at topic start but for n00bis like me needed self test and I test :-)

p.s. all work in normal intensity, but low hashrate and lagging on intel gpu render desktop