ethereum / aleth

Aleth – Ethereum C++ client, tools and libraries
GNU General Public License v3.0
3.96k stars 2.17k forks source link

ubuntu 'gpu memory fragmentation' and core dump #1901

Closed stephantual closed 9 years ago

stephantual commented 9 years ago

on commit b821f0e7dc66bcc1ed35ecbc46c84fe83562b192 via PPA

eth -M --opencl

Benchmarking on platform: { "platform": "AMD Accelerated Parallel Processing", "device": "Hawaii", "version": "OpenCL 1.2 AMD-APP (1411.4)" }
Preparing DAG...
  ℹ  13:45:40|eth  Loading from libethash...
  ℹ  13:45:40|eth  Done loading.
  ℹ  13:45:40|gpuminer0  workLoop 0 #00000000… #00000000…
Using platform: AMD Accelerated Parallel Processing
Using device: Hawaii(OpenCL 1.2 AMD-APP (1411.4))
  ℹ  13:45:40|gpuminer1  workLoop 0 #00000000… #00000000…
Using platform: AMD Accelerated Parallel Processing
Using device: Intel(R) Core(TM) i3-4330 CPU @ 3.50GHz(OpenCL 1.2 AMD-APP (1411.4))
Warming up...
Segmentation fault (core dumped)

When trying to eth -m on -f I get 'GPU memory fragmentations' errors in the log.

CJentzsch commented 9 years ago

I had the same issue. This means your gpu-memory is not freed after using. A reboot (or restart of the driver) free the memory and it should work afterwards. I am not sure, but this: https://github.com/ethereum/cpp-ethereum/pull/1905 might help. Also see: https://github.com/ethereum/cpp-ethereum/commit/83650be2139734148ab1fc9882ff9e5074ac17ba#commitcomment-11177212

stephantual commented 9 years ago

Thanks @CJentzsch - reboot unfortunately didn't help. I'll wait for the merge.

CJentzsch commented 9 years ago

When reboot, didn't help, than the merge wont help either. From your logs I am bit confused: Using platform: AMD Accelerated Parallel Processing Using device: Intel(R) Core(TM) i3-4330 CPU @ 3.50GHz(OpenCL 1.2 AMD-APP (1411.4)) How can your Intel processor be on your AMD platform? Since you say "gpu memory fragmentation" and you use --opencl, it shold look for your GPU. What GPU do you have (esspecially how much memory do you have in your GPU)?

stephantual commented 9 years ago

Thanks @CJentzsch

Right, so eth -M --opencl returns the above I posted. Which as you spotted tries to run the algo on my CPU for some reason and fails.

I have to do --opencl-device 0 to get somewhere:

gozer@gozer1:~$ eth -M --opencl --opencl-device 0
Benchmarking on platform: { "platform": "AMD Accelerated Parallel Processing", "device": "Hawaii", "version": "OpenCL 1.2 AMD-APP (1411.4)" }
Preparing DAG...
  ℹ  15:57:44|eth  Loading from libethash...
  ℹ  15:57:44|eth  Done loading.
  ℹ  15:57:44|gpuminer0  workLoop 0 #00000000… #00000000…Warming up...

Using platform: AMD Accelerated Parallel Processing
Using device: Hawaii(OpenCL 1.2 AMD-APP (1411.4))
Trial 1... 25864874
stephantual commented 9 years ago

As for specs it's an r9 290x 8GB

tcoulter commented 9 years ago

I've received similar errors on a Windows build (develop branch, at the time of this writing), with a similar architecture (2x AMD r9 280x 4GB). Rebooting doesn't seem help, nor changing the values of --opencl-device or --opencl-platform.

anthony-cros commented 9 years ago

Same issues here with AMD on ubuntu:

$ ./ethminer -G --opencl-platform 0 --opencl-device 0 -F http://127.0.0.1:8545
No protocol specified
Error: No root privilege. Please check with the system-admin.
No protocol specified
  ℹ  16:36:54|ethminer  Getting work package...
  ℹ  16:36:54|ethminer  Got work package:
  ℹ  16:36:54|ethminer    Header-hash: 1ff6d4483c55a514b1a1ed07696969cc03bce6ded73228c5c47e830727a6fe9c
  ℹ  16:36:54|ethminer    Seedhash: d705bfceb18862841d146b65702167152de74c08a4c1821a1698fcc414d8978e
  ℹ  16:36:54|ethminer    Target: 00000002b864893291f8cf70845a0ab3e3323dd527a48bb8e121070c2f275a04
  ℹ  16:36:54|gpuminer0  workLoop 0 #00000000… #d705bfce…
  ℹ  16:36:54|gpuminer0  Awaiting DAG 0
  ℹ  16:36:54|gpuminer0  Loading full DAG of 330000
  ℹ  16:36:54|ethminer  Mining on PoWhash #1ff6d448… : 0 H/s = 0 hashes / 0.5 s
  ℹ  16:36:54|gpuminer0  Awaiting DAG 0
  ℹ  16:36:55|gpuminer0  Awaiting DAG 0
  ℹ  16:36:55|ethminer  Mining on PoWhash #1ff6d448… : 0 H/s = 0 hashes / 1.001 s
  ℹ  16:36:55|gpuminer0  Awaiting DAG 0
  ℹ  16:36:55|ethminer  Mining on PoWhash #1ff6d448… : 0 H/s = 0 hashes / 1.501 s
  ℹ  16:36:56|gpuminer0  Awaiting DAG 0
  ℹ  16:36:56|ethminer  Mining on PoWhash #1ff6d448… : 0 H/s = 0 hashes / 2.002 s
  ℹ  16:36:56|gpuminer0  Awaiting DAG 0
  ℹ  16:36:56|ethminer  Mining on PoWhash #1ff6d448… : 0 H/s = 0 hashes / 2.503 s
  ℹ  16:36:57|gpuminer0  Awaiting DAG 0
  ℹ  16:36:57|ethminer  Mining on PoWhash #1ff6d448… : 0 H/s = 0 hashes / 3.004 s
  ℹ  16:36:57|gpuminer0  Awaiting DAG 0
  ℹ  16:36:57|ethminer  Mining on PoWhash #1ff6d448… : 0 H/s = 0 hashes / 3.505 s
  ℹ  16:36:58|gpuminer0  Awaiting DAG 0
  ℹ  16:36:58|ethminer  Mining on PoWhash #1ff6d448… : 0 H/s = 0 hashes / 4.005 s
  ℹ  16:36:58|gpuminer0  Awaiting DAG 0
  ℹ  16:36:58|ethminer  Mining on PoWhash #1ff6d448… : 0 H/s = 0 hashes / 4.506 s
  ℹ  16:36:58|ethminer  Got work package:
  ℹ  16:36:58|ethminer    Header-hash: 3077b93bceb47da8c3aedec6ca086be90c21b68c9c31a471d3847c123fa210c1
  ℹ  16:36:58|ethminer    Seedhash: d705bfceb18862841d146b65702167152de74c08a4c1821a1698fcc414d8978e
  ℹ  16:36:58|ethminer    Target: 00000002b8bba0a4c4693407f62f79e48d5515a84a063f0a1b5f58cf20054b0e
  ℹ  16:36:59|gpuminer0  Awaiting DAG 0
  ℹ  16:36:59|gpuminer0  Awaiting DAG 0
  ℹ  16:36:59|gpuminer0  Loading from libethash...
  ℹ  16:36:59|gpuminer0  Done loading.
  ℹ  16:36:59|gpuminer0  Full DAG loaded
Using platform: AMD Accelerated Parallel Processing
Using device: AMD A6-7400K Radeon R5, 6 Compute Cores 2C+4G(OpenCL 1.2 AMD-APP (1642.5))
Segmentation fault (core dumped)
stephantual commented 9 years ago

I might have found a solution - will post back in the morning as it's 3am and I've been at this for 6 hours hahah :)

stephantual commented 9 years ago

TL;DR: on 14.04.02 w/ ATI chips a bug forces users to take non-standard actions to enable the graphic drivers leading to GPU ram not being recognized properly. Solution is to enable proposed-trusty then enable the GFX driver for ATI in the control panel (or apt-get fglrx for those CLI-inclined).

Long version:

On Ubuntu 14.04.02, apt-get fails to install fglrx or fglrx-updates. This is a known bug and is detailed on https://bugs.launchpad.net/ubuntu/+source/fglrx-installer/+bug/1424491.

This bug is NOT present on 14.04.01, which is why not everyone is seeing it.

The reason it's affecting some people here is that ubuntu by default will use the default open source drivers which don't recognize certain screens. It makes the resolution rubbish, so before they install ethereum, people will try and fix this situation.

The first step they take is go in 'proprietary drivers' in the control panel for software updates and try and click the fglrx or fglrx-updates repo. This is broken - the GUI updates for a brief instant then reverts back to the default open source drivers, which don't support OCL well.

There are many flawed 'workarounds', but in my case, I did:

sudo apt-get install xorg-video-abi-15
sudo apt-get install fglrx-updates

This gives the illusion that fglrx is running well - the panel updates, and you can run --initial and all that good stuff. but the the problem with the above, is that somehow it messes up something with the way GPU ram is recognized and in my case would only recognize 300mb out of my 8GB.

The ONLY fix is to allow for the proposed-trusty to be enabled, then flip the driver in the cpanel after a reboot, and reboot again. At that point i stopped getting the memory fragmentation error in ubuntu.

tcoulter commented 9 years ago

Interesting. I'm getting the same errors on a Windows build, and wondering how this might apply. If you have any thoughts, I'm all ears.

julian1 commented 9 years ago

@tcoulter Possibly related to https://github.com/ethereum/cpp-ethereum/issues/1914

tcoulter commented 9 years ago

Thanks @julian1, that really helped.

I've been able to get GPU mining on Windows to work, though at reduced performance (still better than CPU mining).

First, see my comment here: https://github.com/ethereum/cpp-ethereum/issues/1943

You need to change the call to SHGetFolderPathW in io_win32.c to call SHGetFolderPath instead, then rebuild ethminer. After that, the following command worked for me:

./ethminer.exe -G -t 2 -F "http://192.168.1.4:8545"

Note that it seems as though my hardware configuration plays a big part in the latter command. Specifically, I have three GPUs on my system: one onboard intel GPU, and two AMD r9 280x's. One of those 280x's is my default display device. As far as I can tell, the -t 2 parameter prevents mining on the third GPU (the onboard Intel GPU), which if not specified, would cause a crash (it seems that ethminer wants to use AMDs OpenCL code when mining with the Intel GPU); as well, the first AMD GPU fails with the "GPU memory fragmentation" error, likely because it's the default display device and is powering the monitor. Thus, I only get one GPU that works (the second 280x), and because there's only one thread running, it runs at about half power. Crappy, but right now I'll take it.

Side note: whenever I used the --opencl-device parameter, I'd get the "GPU memory fragmentation" error no matter the device number entered. I wouldn't be surprised if the code is ignoring the device number I enter there and is always using the first GPU (which is the first 280x -- the display device, as I mentioned above).

awrelll commented 9 years ago

I got the same issue too, i tried all sorts of driver versions, proprietary/open-source, but no luck so far.

I got a R9 280x with 3GB RAM , 8GB RAM - Running on Ubuntu 14.04

Using device: Tahiti(OpenCL 1.2 AMD-APP (1445.5))
  ℹ  13:34:26|ethminer  Mining on PoWhash #b6e6dcad… : 0 H/s = 0 hashes / 0.502 s
  ℹ  13:34:26|ethminer  Got work package:
  ℹ  13:34:26|ethminer    Header-hash: 81136819d6cc78d3b07426ef712045a7f63fab0d0458020023e0bf27dfadca20
  ℹ  13:34:26|ethminer    Seedhash: 4db8c3d89b7de6ddb733a736d664ebda3b6c5c5131f406df463e8f83d7805283
  ℹ  13:34:26|ethminer    Target: 0000000096c61f7efacb6f094cada51629c4e811c021d24cb8335a58eec594a4
  ✘  13:34:27|gpuminer0  Error GPU mining. GPU memory fragmentation?
  ℹ  13:34:29|gpuminer0  workLoop 1 #4db8c3d8… #4db8c3d8…
  ✘  13:34:29|gpuminer0  Error GPU mining. GPU memory fragmentation?
  ℹ  13:34:29|ethminer  Mining on PoWhash #81136819… : 0 H/s = 0 hashes / 0.5 s

@stephantual I even checked if my GPU's RAM is enough and recognized, it seems ok.

[    27.322] (II) fglrx(0): VESA VBE OEM Software Rev: 15.41

[    27.322] (II) fglrx(0): VESA VBE OEM Vendor: (C) 1988-2010, Advanced Micro Devices, Inc.

[    27.322] (II) fglrx(0): VESA VBE OEM Product: TAHITI

[    27.322] (II) fglrx(0): VESA VBE OEM Product Rev: 01.00

[    27.322] (--) fglrx(0): Video RAM: 3145728 kByte, Type: GDDR5
awrelll commented 9 years ago

I think i kinda solved it. I deleted the blockchain folder inside .ethereum and got geth to sync again from scratch. All cool now :)

LefterisJP commented 9 years ago

Should not occur anymore. Reopen with a comment if this happens again.