fireice-uk / xmr-stak

Free Monero RandomX Miner and unified CryptoNight miner
GNU General Public License v3.0
4.05k stars 1.79k forks source link

8gb RX580 Gigabyte card detected as 4GB #1764

Open Angel996 opened 6 years ago

Angel996 commented 6 years ago

Hello.

I have just built a new rig with 4x Gigabyte RX580 8GB GPUs. First run xmr-stak, in amd.txt:

// gpu: Ellesmere memory:3920 // compute units: 36 { "index" : 0, "intensity" : 864, "worksize" : 8, "affine_to_cpu" : false, "strided_index" : 1, "mem_chunk" : 2, "comp_mode" : true

Ubuntu 16.04 LTS 4.4.0-112, Amdgpu 17.40, xmr-stak 2.4.5 b3f79de, AMD 970 motherboard, Athlon X2 245 CPU. I compiled xmr-stak myself, but on another rig.

If I update intensity manually to a higher value (e.g. 1152), I get err msg:

Error CL_INVALID_BUFFER_SIZE when calling clCreateBuffer to create hash scratchpads buffer.

clinfo reports 8GB for these GPUs:

Device Name                                     Ellesmere
Device Vendor                                   Advanced Micro Devices, Inc.
Device Vendor ID                                0x1002
Device Version                                  OpenCL 1.2 AMD-APP (2482.3)
Device OpenCL C Version                         OpenCL C 1.2
Device Type                                     GPU
Device Board Name (AMD)                         Radeon RX 580 Series
Global memory size                              8576299008 (7.987GiB)
Global free memory (AMD)                        8319868 (7.934GiB)
Global memory channels (AMD)                    8
Global memory banks per channel (AMD)           16
Global memory bank width (AMD)                  256 bytes

Why is 8GB of GPU memory not detected?

thanks

Spudz76 commented 6 years ago

Set environment vars prior to launch (most likely the first one, though):

GPU_FORCE_64BIT_PTR=1
GPU_MAX_ALLOC_PERCENT=100
GPU_MAX_HEAP_SIZE=100
GPU_SINGLE_ALLOC_PERCENT=100
GPU_USE_SYNC_OBJECTS=1

About the same as 32-bit OS on >4GB system RAM... caps off cuz thats how high 32-bit can count. The rest can't hurt, they are in just about every other AMD miner apps readme... I run them

But also you may get better performance listing each card twice with around 880 intensity (so 1760 effective each GPU, but two job pipelines to each), and that can also get around the max-single-alloc thing (since it's two sessions, and two regions of memory not just one huge one).

Angel996 commented 6 years ago

Well, I have that already...

export GPU_FORCE_64BIT_PTR=1
export GPU_MAX_HEAP_SIZE=100
export GPU_MAX_ALLOC_PERCENT=100
export GPU_SINGLE_ALLOC_PERCENT=99
export GPU_USE_SYNC_OBJECTS=1

Thanks, I'll try the double listing thing. Still curious why it's not working as it should.

Angel996 commented 6 years ago

I tried the double-listing idea, lolz, now I got 8 GPUs with 50% hashrate each. )) So, it's the same hashrate.

I also compiled the lastest ver on this very rig. Same result.

psychocrypt commented 6 years ago

@Angel996 The miner only reports the information which are comming from OpenCL. Somewhere in the OpenCL spacs there is a definition that a device must expose as minimum 25% of the total memory. The most OpenCL implementations limit it to something like 25% or like in your case to 50%. As @Spudz76 wrote with the environment variable you can sometimes increase the limit. Never the less running two threads per device will give you in th emost cases the best results.

Angel996 commented 6 years ago

How come clinforeports ~ 8 gigs free? When miner running, clinfo reports ~ 4 gigs free, actually. Doesn't clinfo obtain this information from OpenCL?

I am getting exactly the same total hashrate with dual-thread config, each card hashrate is halved. I get around 720 h/s per card off a Gigabyte RX580 8Gb Micron memory. I hear, it's possible to achieve 1000 h/s with 8Gb cards...

I forgot to mention, I'm doing cryptonight heavy.

minzak commented 6 years ago

@Angel996 It is no matter 4 or 8 Gb, On 4Gb too possible to get almost 1000H

Angel996 commented 6 years ago

bizlevel, one can place more threads in 8gb card's memory, that makes 8gb cards faster than 4gb cards with cryptonightheavy.

psychocrypt commented 6 years ago

OpenCL is reporting that 8gib are free but this is not meaning you can all allocate with one allocation call. This is the reason why the miner shows 4GiB as free memory. Ee count only what we can allocate with one call. Never the less use two threads and you can use the full 8GiB.

leve1ord commented 6 years ago

2 threads a must have on rx 480/580 4 or 8 gig to get ~1000h/s, try like this: { "index" : 0, "intensity" : 864, "worksize" : 8, "affine_to_cpu" : false, "strided_index" : 1, "mem_chunk" : 2, "comp_mode" : true }, { "index" : 0, "intensity" : 864, "worksize" : 8, "affine_to_cpu" : false, "strided_index" : 1, "mem_chunk" : 2, "comp_mode" : true }, and play with intensity, on my rx588 a 1024 is an absolute enough.

Angel996 commented 6 years ago

I get 350 h/s per virtual card in that config. That yeilds around 3500 h/s total. Same as single thread config. You know, whats weird though, I checked clinfo, and it still reports 3.8gb free RAM. Meaning, it's still not using the memory and it created threads proportionally smaller, that's why they are not any faster.

clinfo | grep -i 'free memory'
  Global free memory (AMD)                        3938676 (3.756GiB)
  Global free memory (AMD)                        3938676 (3.756GiB)
  Global free memory (AMD)                        3938676 (3.756GiB)
  Global free memory (AMD)                        3938676 (3.756GiB)
  Global free memory (AMD)                        3938676 (3.756GiB)

(it's a 5 GPU rig now)