Kalroth / cgminer-3.7.2-kalroth

Kalroth's personal cgminer 3.7.2 branch
Other
55 stars 54 forks source link

Secondary failover pool switching causes client freeze #22

Open burnbrigther opened 10 years ago

burnbrigther commented 10 years ago

I can manually change pools without a problem, but I am seeing a problem with the miner freezing after a secondary pool failover. Manually changing pools validates the username and password is correct for my secondary pool.

Not that it matters, but hardware configuration is: 290X 290 280X

Here is my configuration: color 02 del *.bin setx GPU_MAX_ALLOC_PERCENT 100 setx GPU_USE_SYNC_OBJECTS 1 C:\Users\kurt\Documents\CLIENTS\LITECOIN\KALROTH\cgminer.exe --scrypt -u XXXXXXXXXX -p x -o stratum+tcp://middlecoin.com:3333 -o stratum+tcp://pool1.us.multipool.us:7777 -u YYYYYYYYYY.1 -p x -I 20,19,13 -g 1,1,2 -w 256 --thread-concurrency 24550,32765,14366 --gpu-powertune 20 --auto-fan --gpu-engine 947 --gpu-memclock 1500 --temp-cutoff 95 --temp-overheat 85 --temp-target 75 --api-listen --api-allow W:192.168.1/24 --api-port 4028

Kalroth commented 10 years ago

Can you try and write the config file from cgminer: press S -> W -> "test.conf" Does the pool information look correct if you open the config file in notepad?

burnbrigther commented 10 years ago

Here is how it appears, with pool info edited, but it is correct:

{ "pools" : [ { "url" : "stratum+tcp://middlecoin.com:3333", "user" : ³XXXXX", "pass" : "x" }, { "url" : "stratum+tcp://pool1.us.multipool.us:7777", "user" : ³YYYYY.1", "pass" : "x" } ] , "intensity" : "20,19,13", "vectors" : "1,1,1", "worksize" : "256,256,256", "lookup-gap" : "0,0,0", "gpu-threads" : "1,1,2", "thread-concurrency" : "24550,32765,14366", "gpu-engine" : "947-947,947-947,947-947", "gpu-fan" : "0-85,0-85,0-85", "gpu-memclock" : "1500,1500,1500", "gpu-memdiff" : "0,0,0", "gpu-powertune" : "20,20,20", "gpu-vddc" : "0.000,0.000,0.000", "temp-cutoff" : "95,95,95", "temp-overheat" : "85,85,85", "temp-target" : "75,75,75", "api-listen" : true, "api-mcast-port" : "4028", "api-port" : "4028", "auto-fan" : true, "expiry" : "120",

Kalroth commented 10 years ago

There doesn't seem to be anything suspicious in the configuration.

But let me make sure I understand it correctly: when both pools fail, cgminer freezes on you? It doesn't exit or crash? And there's no error message of any kind?

burnbrigther commented 10 years ago

Correct. When the failover was happening, the pool appeared to switch, then no more mining activity. It did not try to switch back. I will try to capture a screen shot of the problem.

burnbrigther commented 10 years ago

screen shot 2014-03-02 at 9 54 11 pm Ok, it happened again. Error is Stratum connection to Pool 0 Interrupted. Attached a screenshot...

varnav commented 10 years ago

I don't have secondary pool, only one pool. If connection to the pool is lost - program never reconnects, just says "connection interrupted" as on the screenshot and stays this way.

burnbrigther commented 10 years ago

Happened again last night, this time the failover did in fact happen and the api reports activity to Anubis, but there is no real hashing going on. From the picture you can see cgminer shows the rates (not updating, just frozen) and the api reports the rate (same), but there is no real activity happening - you can see no accepted work. screen shot 2014-03-03 at 8 20 42 am

Kalroth commented 10 years ago

I have located the code that is causing this, but said code works very well for myself - as well as a few others that I poked about this issue.

It is basically stuck in an eternal loop trying to restore the server connection, and nothing else is going to update before that happens. I suspect it's a fault at a different place, since cgminer is trying to recreate the connection every 30 seconds, as per code below:

while (!restart_stratum(pool)) {
    if (pool->removed)
        goto out;
    cgsleep_ms(30000);
}

You aren't experiencing connection issues with any other software on your computer?

varnav commented 10 years ago

For me it looks like this miner

It will stay like this forever, until restarted. Hashrates and temperature displayed is wrong, but program is responsible.

All other software works fine.

I'm using Windows 7 x64

burnbrigther commented 10 years ago

I'm really not experiencing any other issues. Would it be possible to add internal health-check and restart?

varnav commented 10 years ago

sgminer (https://github.com/veox/sgminer) has same issue for me

burnbrigther commented 10 years ago

Happened just now again. And again after the pool switch is done.

screen shot 2014-03-04 at 12 00 43 am

Kalroth commented 10 years ago

The "interesting" thing is that your second pool isn't working either. The odds of both pools being down at the same time is very low, so it is definitely something on the client side.

Are you running any kind of anti-virus that might interfere? And have you always had this issue or did it start recently?

varnav commented 10 years ago

I have only one pool, I have antivirus (MSE) but it has no network features. This issue was always there, cgminer 3.7.2 has it too.

Kalroth commented 10 years ago

Okay, if the problem is that old, then it's not likely that I'll find a fix anytime soon. Can you try a load balanced setup to see if it fixes anything for you?

Set up something like this, where 1 indicates where you want to mine and 0 indicates your fail over servers. If you only have one server, then just use that one - although it's always a good idea to have at least one backup server.

"pools" : [
    {
        "name" : "Middlecoin [Amsterdam]",
        "quota" : "1;stratum+tcp://amsterdam.middlecoin.com:3333",
        "user" : "1DNBcSEENBwDKrcTyTW61ezWhzsPy5imkn",
        "pass" : "x"
    },
    {
        "name" : "Clevermining [Amsterdam]",
        "quota" : "0;stratum+tcp://eu.clevermining.com:3333",
        "user" : "1DNBcSEENBwDKrcTyTW61ezWhzsPy5imkn",
        "pass" : "x"
    },
    {
        "name" : "Wafflepool [Amsterdam]",
        "quota" : "0;stratum+tcp://eu.wafflepool.com:3333",
        "user" : "1DNBcSEENBwDKrcTyTW61ezWhzsPy5imkn",
        "pass" : "d=256"
    }
],
"load-balance" : true,
burnbrigther commented 10 years ago

Martin - do you think you will be able to address this stopping soon? As it stands now, I can't leave the client for more than 5 - 6 hours before this happens. Each night I'm losing a few hours of mining until I wake up and restart it in the morning.

varnav commented 10 years ago

Same thing for me. Runs for several hours and dies.

Kalroth commented 10 years ago

I'm not sure I'm able to address this at all, much less soon. I cannot reproduce the error on my side, so I'd be changing stuff in blind.

Did you try changing your configuration like I posted above?

varnav commented 10 years ago

If cgminer is running and you unplug internet for a minute and plug it back in - does it reconnect for you?

burnbrigther commented 10 years ago

Actually, yeah, that's a great test Martin. Maybe you can replicate it that way? Also - I have another client running on my host - rapidprime - (which consequently manages to always reconnect without problems) which consumes a lot of cpu, but that should not matter.

Kalroth commented 10 years ago

Yes, I don't have issues with reconnecting. It just waits until there's a connection again and continues - I have been offline a few times the last couple of months, both by accidence and on purpose.

I'm thinking its something in your setup that's doing it, a router, operating system, drivers, software, etc. There's enough cgminer users that it should be easy to find more complaints if this was a common issue.

burnbrigther commented 10 years ago

Oh well. Stuck with this problem then. I'm not seeing the problems on my host.

Kalroth commented 10 years ago

Maybe you can use an external application to monitor the process, cgwatcher or something similar. If nothing else, then automate a program restart every 1-2 hours - it's not perfect, but it's better than losing a night worth of mining.

varnav commented 10 years ago

Yes, for me network connection lost and then restored works fine, cgminer reconnects succesfully, not hangs.

burnbrigther commented 10 years ago

Ok, this happened again last night, but also on the rig that has never shown the symptoms before. I need to figure out something. Losing a whole 5MHs over night is no fun. Is there an intermediary proxy of some kind I could use? I still think there is something going on that maybe not enough people are reporting. I still believe some sort of built in health check on the application is worth looking in to.

Kalroth commented 10 years ago

There is a health check built in, you're triggering it and it isn't recovering. Like I already said above - I have no way to reproduce the issue, so I do not know what code to fix.

If there were more people with this issue, then maybe I'd get more error reports and possibly narrow the issue down. As it is right now, I still believe it's something external that's stopping the cgminer process from creating new connections. But it's just a guess and I'll need more/better information on the error before I can do anything.