ethereum / aleth

Aleth – Ethereum C++ client, tools and libraries
GNU General Public License v3.0
3.96k stars 2.17k forks source link

Successful builds don't start recently (buntu) #1935

Closed stephantual closed 9 years ago

stephantual commented 9 years ago

So cpp-ethereum-b821f0e7dc66bcc1ed35ecbc46c84fe83562b192 works well.

I since then tried to manually compile and run:

cpp-ethereum-c1d40c281413d27c45568d962aa15cc8965913af cpp-ethereum-ff3360079b39be68495d87b7fb6554c7be692a2d ... amongst others.

...as well as various PPA through the last 2 days and nothing starts - not getting anything at all from the command line when invoking eth, solc or ethminer besides a static cursor.

levijiles commented 9 years ago

Same issue on a fresh Ubuntu installation. 0.9.20 worked WELL, but latest 0.9.21 (eth, ethminer) will not start. If you change the "additional driver" back to x.org (xserver-xorg-video-ati), builds will start again, but GPU will fault out. This is reproducible without a reboot. Only happened after upgrade.

wil611 commented 9 years ago

Same issue Ubuntu 14.04. Rebuilt the OS and recompiled cpp-ethereum with -DGUI=0 still no luck.

larspensjo commented 9 years ago

It seems to get stuck waiting for a mutex. But I don't know why.

larspensjo commented 9 years ago

I did some testing, and found the following commit that breaks the build:

commit 5557122627b89da49d92deb79b7aa13f6d1fb06b
Merge: db7b548 dfdc1a4
Author: chriseth <c@ethdev.com>
Date:   Fri May 15 11:12:13 2015 +0200

    Merge pull request #1889 from chriseth/sol_multipleTagsOnStack

    Known state: store tags on stack as unions.

So the last known working code is dfdc1a44e9dc820f9931e9d17c0070386cfee0d7. The 0.9.21 release was 802024ba0e5ad2693c901301f3db2cf9a708addc (15 commits earlier).

levijiles commented 9 years ago

edit: I just tried to rebuild b821f0e7dc66bcc1ed35ecbc46c84fe83562b192 That version now gives me Memory Frag when GPU mining every single time.

edit edit: Fell back to dfdc1a44e9dc820f9931e9d17c0070386cfee0d7 - GPU mining works on private chain. Rebuilding from source destroyed my blockchain which I get to spend 3 hours reimporting. Total time spent on issue about 20 hours. Maybe more. Hopeful the block chain finishes importing before Olympic mining reward is over...

edit edit edit: OF COURSE after the block chain is downloaded and a new DAG is generated for current epoch... I am back to "GPU memory fragmentation?" I can run benchmark fine.

GGWP - I quit until APT has a working version again.

larspensjo commented 9 years ago

Looking a little more into the source code, the problem is the following line in libethash-cl/ethash_cl_miner.cpp:

    // create buffer for dag
    m_dag = cl::Buffer(m_context, CL_MEM_READ_ONLY, _dagSize);

When running the benchmark test, _dagSize is 1073739904 (0x3FFFF880). When running real mining, size is 1174404736 (0x45FFFE80). The smaller size will succeed, but the bigger size will raise exception clCreateBuffer. This exception will trigger the error message "GPU memory fragmentation".

I don't know what this means. Maybe it is just what it says, there isn't enough continuous memory on the graphics card? My memory card is supposed to have 2GB.

subtly commented 9 years ago

The benchmark can run on a GPU that has 1GB of memory however actual mining requires greater than 1GB of memory on the GPU. We're still looking at this but some GPUs aren't able to create a 1GB buffer and/or they require that the host have this memory allocated in a contiguous region. What kind of GPU do you have?

larspensjo commented 9 years ago

I have AMD 7870 2048 MB. Running some OpenGL debug shows:

Vendor: ATI Technologies Inc.
Renderer: AMD Radeon HD 7800 Series 
Version: 4.3.12798 Compatibility Profile/Debug Context 13.35.1005
GLSL: 4.30
OpenGL context version parsed by GLFW: 4.3.12798
OpenGL context flags: none
OpenGL profile mask: 0x00000002 (compatibility)
VBO_FREE_MEMORY_ATI total 1420378, largest block 1375168, total aux 1571111, largest aux block 24576

OpenGL isn't OpenCL, but it seems the numbers are close.

Update: Restarting the Mint Desktop using "software render" instead, gives better numbers:

VBO_FREE_MEMORY_ATI total 1833091, largest block 1678595, total aux 2014145, largest aux block 768

But buffer creation still fails for mining. Largest block would be 1678595*1024=1,718,881,280 bytes. If I try to run mining from console only (no desktop), the program will crash. Using gdb didn't show any usable stack trace.

larspensjo commented 9 years ago

I found that allocating a buffer of size 0x40000000 succeeds, but size 0x40000001 fails!

It seems there is a driver limit, independent on actually available memory?

larspensjo commented 9 years ago

Finally made some progress! It seems to be a feaure (read: bug) of AMD. If you do the following:

export GPU_MAX_HEAP_SIZE=95
export GPU_MAX_ALLOC_PERCENT=100

I will get access to more memory. With this, I can now start GPU mining. See https://community.amd.com/message/1288143#1288143

levijiles commented 9 years ago

Larspenjso, are you saying this made the new builds run or that mining in the old versions now works? I am still unable to run the new builds after Apt-get install cpp-ethreum.

I got it working with old builds from source...

larspensjo commented 9 years ago

Right, I can't use the new builds either. So as you say, I can only get it working with old builds.

There has been a couple of new apt-get updates since, but they all continue to fail in the same way.

levijiles commented 9 years ago

Have you noticed when you rebuild the old builds that it resets your block chain? I couldnt just extract .ethereum contents either, I had to run ./eth -I blockchainimport and that only gives me about 100blocks/sec = 60 minutes for reimport!

larspensjo commented 9 years ago

I haven't seen that. But then, it may depend on what old build you use...

cjphilpot commented 9 years ago

I have the same issue. Older builds work great but now eth and ethminer dont even start.

wil611 commented 9 years ago

The addition in libethash-cl/ethash_cl_miner.cpp of

ethash_cl_miner::~ethash_cl_miner() { finish(); }

introduced 6 days ago in commit 19f3a5802194f8a891ef1aa44f03a7b1573881ef is what is causing this issue on my system. Removing it and the corresponding reference in libethash-cl/ethash_cl_miner.h complies and both eth and ethminer work on commit 4268174af782c03ea828b76e00644ca9ba3e8ed3. I haven't mined with it yet but the benchmark test eth -G -M --opencl-device 1 worked without issue but since I'm currently mining olympic and am not sure what adding the destructor ~ethash_cl_miner actually is meant to do I'm going to wait until after block 400000 to try mining with the latest commit.

levijiles commented 9 years ago

Sudo apt-get install cpp-ethereum Sudo eth -j ...nothing happens, no feedback at all, not even a prompt for sudo password.

Considering that there is a fork and we need to update to the latest version, please advise.

larspensjo commented 9 years ago

Funny, eth now starts for me. I don't know what changed. I was doing some debugging with the source code, to find out exactly what made the application hang, when I could suddenly no longer make it happen.

The only explanation I have is that it would be some external dependency that was updated.

jeffmakes commented 9 years ago

Same problem here. Just built cpp-ethereum:01a2289c857ca7beb1576e5fe9c8638d655b43fd on ubuntu following this guide https://github.com/ethereum/cpp-ethereum/wiki/Building-on-Ubuntu

$ ./eth --help or $ ./eth -j anything, just hangs eating 100% CPU.

ethminer -M -G gets further (but has other problems!)

$ ./ethminer -M -G Benchmarking on platform: { "platform": "AMD Accelerated Parallel Processing", "device": "Tahiti", "version": "OpenCL 1.2 AMD-APP (1729.3)" } Preparing DAG... Warming up... ℹ 19:48:44|gpuminer0 workLoop 0 #00000000… #00000000… ℹ 19:48:44|gpuminer0 Initialising miner... Using platform: AMD Accelerated Parallel Processing Using device: Tahiti(OpenCL 1.2 AMD-APP (1729.3)) Trial 1... 0 Trial 2... 0 Trial 3... 0 Trial 4... 0 Trial 5... 0

then hangs.

chriseth commented 9 years ago

Does that still happen?

larspensjo commented 9 years ago

I don't have the problem any longer.

jeffmakes commented 9 years ago

I ended up upgrading some hardware in the machine (just because it was painfully slow), reinstalled ubuntu and recompiled everything, and the problem went away.

So I can't put my finger on one thing that fixed it, but I did these things:

Cheers Jeff

On 2 July 2015 at 12:30, chriseth notifications@github.com wrote:

Does that still happen?

— Reply to this email directly or view it on GitHub https://github.com/ethereum/cpp-ethereum/issues/1935#issuecomment-118004966 .

wil611 commented 9 years ago

It still happens on my system with the latest devlop build. When I start eth it just hangs with no out put until I ctl C. if I remove

ethash_cl_miner::~ethash_cl_miner() { finish(); } from libethash-cl/ethash_cl_miner.cpp

and

~ethash_cl_miner()

from libethash-cl/ethash_cl_miner.h

and recompile it runs without issue.

wil611 commented 9 years ago

I'm running UBuntu 14.04 with 2x AMD R9-290, 1 R9-280x and 1 R9-270x. I was able to compile and run eth and ethminer up until commit https://github.com/ethereum/cpp-ethereum/commit/19f3a5802194f8a891ef1aa44f03a7b1573881ef after which eth would just hang. I looked through the changes that were made in that commit and was able to successfully run eth after reversing the commits to files libethash-cl/ethash_cl_miner.cpp and libethash-cl/ethash_cl_miner.h. Since that time every new commit fails in the same way until I take out the changes to those two files. My system is updated/upgraded using sudo apt-get update and upgrade before each new cpp-ethereum commit that I've tried. I tried again today with the latest commit and still have the same results. I've been mining using the geth/ethminer combination successfully. what additional information can I supply?

wil611 commented 9 years ago

Not sure what changed but this is no longer an issue for me with the latest build.

LefterisJP commented 9 years ago

There has been a refactoring of the mining code including the parts that were giving you trouble. It's good to see that you no longer experience this issue.