genesismining / sgminer-gm

A multi-algo GPU miner
GNU General Public License v3.0
340 stars 146 forks source link

Intermittent hang during sgminer startup #27

Open keithdunnett opened 7 years ago

keithdunnett commented 7 years ago

This is likely to be an awkward one to track down, an intermittent bug (perhaps a race condition) that occurs irregularly (one time in three or four) when starting sgminer on system startup, and results in sgminer hanging before bringing up the curses interface, after initialising GPUs and connecting to pools but before generating work and getting started.

When this occurs, sgminer stops writing to the debug log and the curses interface, sitting in limbo until I send a Ctrl-C to kill it. Upon which, it briefly draws the curses interface, displaying the last message as "Waiting for work to become available from pools" before exiting with the usual summary. There is a twist, though - at the same time, it then flushes to the debug log a pile of messages from stratum_rthread indicating that work was indeed being received from the pools.

Attached debug log (1.7MB) showing how far it got when it started up at 22:37 and what it subsequently logged in which order on killing it at 01:37, in hopes that this helps identify where the issue is. The same configuration works more often than not, hence the suspicion of a race condition of some sort.

sgminer-hang.log

int03h commented 7 years ago

From what it looks like you are using 2 threads per GPU ? I've had issues when I do that. Specifically on ETH.

keithdunnett commented 7 years ago

Thanks for the reply. I am, yes, though I was able to reproduce the problem with --gpu-threads 1 on at least one occasion. I'll try dropping it back to a single thread - the problem does happen at the point of launching the threads - but I don't think this is the full picture.

Since I wrote the above, initially based on a rig I'm still putting together, the same issue has just surfaced on an older, stable rig that's been running a similar config for a couple of months. I have my suspicions that the problem may have been introduced with 5.5.4. I'll try both, drop one rig to a single thread per GPU and run the other up with 5.4.0 and see where that gets me.