Jimmy-Z / bfCL

OpenCL! fancy!
12 stars 7 forks source link

“out of resource" error on certain nvidia GPUs #3

Open Jimmy-Z opened 6 years ago

Jimmy-Z commented 6 years ago

Related:

https://github.com/zoogie/seedminer/issues/16 https://gbatemp.net/posts/7851408/ https://gbatemp.net/posts/7879961/ https://gbatemp.net/posts/7826386/ https://gbatemp.net/posts/7851868/ https://gbatemp.net/posts/7884725/ https://gbatemp.net/posts/7881547/

I'll need testers and reports, including these info:

your GPU model, GPU RAM size, OS version, driver version. bfcl info output does seedminer's GPU mode throw a "out of resources" error for you? (yes I also need successful reports) if the former is true, try the following build with two test commands, does it also say "out of resources"?

Test build: bfCL-test-reduced-work-size-msky-lfcs-20.zip

Two test commands: bfcl lfcs 00000007 0000 17f5c00d8b581e5e bfcl msky c27164f2e0994db8000000007dd5c901 afcb0cc132bd2aeb8e0a6b6a841c51c0

Techy stuff

Despite what it looks like, this doesn't mean your GPU is not powerful/big enough, this program works on Intel IGPU and uses about several KB of GPU RAM, it's more like a OpenCL runtime bug from nvidia to me.

A reduced work size(from a little above 100,000,000 to 1,000,000) helped this guy with a GTX 980, so I guess this is the problem.

from OpenCL SDK document:

global_work_size Points to an array of work_dim unsigned values that describe the number of global work-items in work_dim dimensions that will execute the kernel function. The total number of global work-items is computed as global_work_size[0] ... global_work_size[work_dim - 1].

The values specified in global_work_size cannot exceed the range given by the sizeof(size_t) for the device on which the kernel execution will be enqueued. The sizeof(size_t) for a device can be determined using CL_DEVICE_ADDRESS_BITS in the table of OpenCL Device Queries for clGetDeviceInfo. If, for example, CL_DEVICE_ADDRESS_BITS = 32, i.e. the device uses a 32-bit address space, size_t is a 32-bit unsigned integer and global_work_size values must be in the range 1 .. 2^32 - 1. Values outside this range return a CL_OUT_OF_RESOURCES error.

nvidia runtime announces GTX 980's address bits = 64, and 100,000,000 is no where near that.

dgc1980 commented 6 years ago

1 platform(s) found: === 0x0283b480 === name : NVIDIA CUDA vendor : NVIDIA Corporation profile : FULL_PROFILE version : OpenCL 1.2 CUDA 9.1.75 1 device(s) found: === 0x0283b890 === name : GeForce GT 730 vendor : NVIDIA Corporation version : OpenCL 1.1 CUDA C version : OpenCL C 1.1 max compute units : 2 max work group size : 1024 type : GPU available : yes compiler available : yes endian : little frequency : 1400 global memory : 2147483648 local memory : 49152

since you wanted it here, I was able to bruteforce the test of the msky no problem just slow as fuck i cancelled the mii bruteforce after it offset 1 but the out of resources problem seems to be fixed for that at least :)

dgc1980 commented 6 years ago

I also tried this version on my 1060, it lowered the speed by like 10%, OCed i get about 700 M/s now I get 630 M/s with this test build

A7F commented 6 years ago

Hi! I got the "out of the resource" error running seedminer gpu but I'm quite out of the loop in the 3ds hacking scene, also I'm not really into these things... However, I'm glad to help you providing as much informations as possible!

What I got running your test build exe:

selected device GeForce GT 545 on platform NVIDIA CUDA
mbed TLS 2.7.0, AES-NI supported
self-test/benchmark mode
AES Key: 0d0b8bd02564dd0351d7e415e6f23f36
randomize source buffer using AES OFB
0.119 seconds for preparing test data, 562.03 MB/s
0.006 seconds for OpenCL compiling
0.031 seconds for data upload, 2195.25 MB/s
# sha1_16_test on 64 MB
0.047 seconds for OpenCL, 1419.57 MB/s
0.033 seconds for data download, 2059.82 MB/s
0.630 seconds for reference C(single thread), 106.49 MB/s
sha1_16_test: succeed
# aes_enc_128_test on 64 MB
0.339 seconds for OpenCL, 198.01 MB/s
0.019 seconds for data download, 3495.44 MB/s
0.202 seconds for reference C(single thread), 332.98 MB/s
aes_enc_128_test: succeed
# aes_dec_128_test on 64 MB
0.385 seconds for OpenCL, 174.09 MB/s
0.018 seconds for data download, 3667.35 MB/s
aes_dec_128_test: succeed
Premere un tasto per continuare . . .

seedminer gpu command output:

GPU selected
New3DS msed
LFCS      : 0x3d835e8
msed3 est : 0x80c4e550
Error est : -3516
ID0 hash 0: 199aa39d36207269e63a7d4402b97d32
Hash total: 1
movable_part2.sed generation success
bfcl msky e835d803020000000000000050e5c480 199aa39d36207269e63a7d4402b97d32 00000000
selected device GeForce GT 545 on platform NVIDIA CUDA
0.011 seconds for OpenCL compiling
local work size: 1024
ocl_assert: ocl_brute.c, function ocl_brute_msky, line 383
        clEnqueueReadBuffer(command_queue, mem_out, CL_TRUE, 0, sizeof(cl_uint), &out, 0, NULL, NULL)
error: out of resources

My current setup:

Microsoft Windows 10 (10.0) Professional 64-bit   (Build 16299)
Intel i7 2600 @3.40GHz
14GB DDR3 RAM dual channel  
DirectX Version 12.0
NVIDIA GeForce GT 545, 3 GB DDR3
GPU Manufacturer: Micro-Star International Co., Ltd. (MSI)
Driver version 390.77
API Direct3D version 11.2
144 CUDA Cores
Win32_VideoController       DriverVersion = 23.21.13.9077
Win32_VideoController       DriverDate = 01/23/2018

If you want me to test something, or if you need further informations, just let me know. :)

Jimmy-Z commented 6 years ago

@A7F thanks but you should run that test build with that two test command I gave in the OP.

R1884 commented 6 years ago

Operating System: Windows 10 Pro, 64-bit GPU: GeForce GT 750M GPU RAM: 2048 MB GDDR5 Driver version: 381.65

bfcl info: name : NVIDIA CUDA vendor : NVIDIA Corporation profile : FULL_PROFILE version : OpenCL 1.2 CUDA 8.0.0 1 device(s) found: === 0x00141430 === name : GeForce GT 750M vendor : NVIDIA Corporation version : OpenCL 1.2 CUDA C version : OpenCL C 1.2 max compute units : 2 max work group size : 1024 type : GPU available : yes compiler available : yes endian : little frequency : 967 global memory : 2147483648 local memory : 49152

py -3 seedminer_launcher3.py gpu: selected device GeForce GT 750M on platform NVIDIA CUDA 0.015 seconds for OpenCL compiling local work size: 1024 ocl_assert: ocl_brute.c, function ocl_brute_msky, line 383 clEnqueueReadBuffer(command_queue, mem_out, CL_TRUE, 0, sizeof(cl_uint), &out, 0, NULL, NULL) error: out of resources

bfcl msky c27164f2e0994db8000000007dd5c901 afcb0cc132bd2aeb8e0a6b6a841c51c0: selected device GeForce GT 750M on platform NVIDIA CUDA 0.290 seconds for OpenCL compiling local work size: 1024 got a hit: c27164f2e0994db82e3d14737dd5c901 24.48 seconds, 78.88 M/s

bfcl lfcs 00000007 0000 17f5c00d8b581e5e: How long should I expect this one to take? It hasn't thrown the "out of resources" error but it's taking a while.

A7F commented 6 years ago

bfcl info

1 platform(s) found:
=== 0x0011f270 ===
name    : NVIDIA CUDA
vendor  : NVIDIA Corporation
profile : FULL_PROFILE
version : OpenCL 1.2 CUDA 9.1.84
        1 device(s) found:
        === 0x0011e8c0 ===
        name : GeForce GT 545
        vendor : NVIDIA Corporation
        version : OpenCL 1.1 CUDA
        C version : OpenCL C 1.1
        max compute units : 3
        max work group size : 1024
        type : GPU
        available : yes
        compiler available : yes
        endian : little
        frequency : 1440
        global memory : 3221225472
        local memory : 49152

bfcl msky c27164f2e0994db8000000007dd5c901 afcb0cc132bd2aeb8e0a6b6a841c51c0

selected device GeForce GT 545 on platform NVIDIA CUDA
0.230 seconds for OpenCL compiling
local work size: 1024
got a hit: c27164f2e0994db82e3d14737dd5c901
38.61 seconds, 50.00 M/s

the first command doesn't say out of resource but only shows this:

selected device GeForce GT 545 on platform NVIDIA CUDA
0.003 seconds for OpenCL compiling
local work size: 1024
0

am I supposed to wait? Because it was something like 20min with that output

Jimmy-Z commented 6 years ago

@A7F Sorry I should have add that if the test command runs a few seconds without "out of resources" error, it's safe to cancel it with ctrl-c.

knight-ryu12 commented 6 years ago

this happen with OC'd GPU cards. Mine is GTX960 card with 4G GDDR5, OverClockable.

1 platform(s) found:
=== 0x007aa420 ===
name    : NVIDIA CUDA
vendor  : NVIDIA Corporation
profile : FULL_PROFILE
version : OpenCL 1.2 CUDA 9.1.84
        1 device(s) found:
        === 0x007a97a0 ===
        name : GeForce GTX 960
        vendor : NVIDIA Corporation
        version : OpenCL 1.2 CUDA
        C version : OpenCL C 1.2
        max compute units : 8
        max work group size : 1024
        type : GPU
        available : yes
        compiler available : yes
        endian : little
        frequency : 1253
        global memory : 0
        local memory : 49152
NeroReflex commented 6 years ago

I could NOT accomplish my task with a nVidia 920m (yes, this is a laptopg GPU).

Windows 10 Home 64-bit 16 GB of DDR3 RAM 2048MB GDDR3 nVidia 384.94 384 CUDA Cores

The error is:

selected device GeForce 920M on platform NVIDIA CUDA
0.018 seconds for OpenCL compiling
local work size: 1024
ocl_assert: ocl_brute.c, function ocl_brute_msky, line 383
        clEnqueueReadBuffer(command_queue, mem_out, CL_TRUE, 0, sizeof(cl_uint), &out, 0, NULL, NULL)
error: out of resources

So, I have downloaded the test build and issued a few commands:

> bfcl

selected device GeForce 920M on platform NVIDIA CUDA
mbed TLS 2.7.0, AES-NI supported
self-test/benchmark mode
AES Key: 0d0b8bd02564dd0351d7e415e6f23f36
randomize source buffer using RDRAND
1.000 seconds for preparing test data, 67.09 MB/s
0.451 seconds for OpenCL compiling
0.061 seconds for data upload, 1104.31 MB/s
# sha1_16_test on 64 MB
0.031 seconds for OpenCL, 2161.66 MB/s
0.057 seconds for data download, 1180.31 MB/s
0.631 seconds for reference C(single thread), 106.35 MB/s
sha1_16_test: succeed
# aes_enc_128_test on 64 MB
0.532 seconds for OpenCL, 126.17 MB/s
0.048 seconds for data download, 1402.16 MB/s
0.251 seconds for reference C(single thread), 266.87 MB/s
aes_enc_128_test: succeed
# aes_dec_128_test on 64 MB
0.533 seconds for OpenCL, 125.80 MB/s
0.048 seconds for data download, 1400.84 MB/s
aes_dec_128_test: succeed
> bfcl info

2 platform(s) found:
=== 0x026d43c0 ===
name    : Intel(R) OpenCL
vendor  : Intel(R) Corporation
profile : FULL_PROFILE
version : OpenCL 1.2
        2 device(s) found:
        === 0x02701e00 ===
        name : Intel(R) HD Graphics 4400
        vendor : Intel(R) Corporation
        version : OpenCL 1.2
        C version : OpenCL C 1.2
        max compute units : 20
        max work group size : 512
        type : GPU
        available : yes
        compiler available : yes
        endian : little
        frequency : 1000
        global memory : 1708759450
        local memory : 65536
        === 0x026ed8c0 ===
        name : Intel(R) Core(TM) i5-4210U CPU @ 1.70GHz
        vendor : Intel(R) Corporation
        version : OpenCL 1.2 (Build 10094)
        C version : OpenCL C 1.2
        max compute units : 4
        max work group size : 8192
        type : CPU
        available : yes
        compiler available : yes
        endian : little
        frequency : 1700
        global memory : 4211548160
        local memory : 32768
=== 0x0272f260 ===
name    : NVIDIA CUDA
vendor  : NVIDIA Corporation
profile : FULL_PROFILE
version : OpenCL 1.2 CUDA 9.0.125
        1 device(s) found:
        === 0x0272f300 ===
        name : GeForce 920M
        vendor : NVIDIA Corporation
        version : OpenCL 1.2 CUDA
        C version : OpenCL C 1.2
        max compute units : 2
        max work group size : 1024
        type : GPU
        available : yes
        compiler available : yes
        endian : little
        frequency : 954
        global memory : 2147483648
        local memory : 49152

of course: this is a laptop and the intel integrates is also available, but bfcl ignores it as it should.

bfcl msky ...............................

selected device GeForce 920M on platform NVIDIA CUDA
0.289 seconds for OpenCL compiling
local work size: 1024
got a hit: c27164f2e0994db82e3d14737dd5c901
36.93 seconds, 52.27 M/s
zoogie commented 6 years ago

@Jimmy-Z - Is it possible if you could push the commits for the test build? It is greatly needed! Thanks!

Jimmy-Z commented 6 years ago

Sorry for the delay, just committed the changes, @zoogie