Open psychocrypt opened 6 years ago
Running this patch now on most of mine although I had no issues before (nanopool) Should prove it doesn't cause any regressions. Mix of cpus, amds, and nvidias, and mix of compilers/distros/drivers, too.
Compiled and running as well. No issues so far. But the error only happened once before this fix.
Edit 5/2/18 - 12:21UTC I should note I'm mining directly to a pool and not using nicehash. No issues for 12 hours.
I'm running b24cb3a24a8a5d72719f061bef2b3316fa4ebe15 on 12 machines now. I've been seeing around 1 to 5 instances of #1505 daily.
This patch works identical to previous for me (meaning, no issues caused). Installed on every rig now.
I get bad results occasionally (100/day) but believe it's from overclock or just old cards being old. And quite a few expired/late results but that's a pool fixed-diff too high thing. Still well under 1% and the exact same as earlier versions.
Just got back home to find 3 machines doing useless work again. Looks the same as before, but let me know if there's more I should be looking at.
I think that looks more like your Internet choked on itself and then the pool got permanently confused about your re-login for ~11 hours following.
@Spudz76 The tcpdump I previously uploaded made it pretty clear it's not the pool that got confused. How does this look different to you?
@aij could you please download the branche https://github.com/psychocrypt/xmr-stak/archive/topic-consumeJobWithOutput.zip compile and run it. This PR will create a lot of debug output. It would be nice if you can set a log file in config.txt and provide use with the log as soon as you reproduce the rejected issue. Hopefully this will give us enough information to detect the source of the issue.
Hi! i tested this release because of issue #1519 - it helps cutting down the invalid shares from about 4-5% to 2% (but let me test this longer to be sure)
No issues so far;
intensity
and affine_to_cpu
edited)Good results : 2740 / 2740 (100.0 %)
(after ~21 hours straight, often have to restart due to Blockchain driver issues...)But I haven't got any issues with the prev releases as well. There were no invalid/rejected shares, maybe some network error, but that's it.
Edits: updated Good results and interval.
@psychocrypt The rwlock changes seem to have solved the original problem I was seeing, but now after stating off well, I'm getting a lot of low difficulty or duplicate shares. Xmrig-proxy seems to be submitting them upstream, where they get rejected by dwarfpool.
Actually, that seems to be persisting even after restarting xmr-stak (and xmrig-proxy), so it may be a problem on dwarfpool.
Just in case, here are logs and configs from that machine (c2n1). xmr-stak-invalid.tar.gz
@aij Thanks for the log, this log is very useful. Never the less you you try to mine against an other pool if you than end up with the same issues.
@psychocrypt It looks like #1505 is still happening, though perhaps less quickly?
xmr-stak-invalid-job-id.txt.gz
Apparently I missed this happening yesterday:
May 06 04:21:20 c1n3 xmr-stak-start[6916]: [2018-05-06 04:21:20] : RECV: {"id":1,"jsonrpc":"2.0","result":{"id":"7971d32c-bdd0-4d7b-90b0-178a2c0db392","job":{"blob":"0707bf91bbd705b26d29e1d07c5cf101144163f3e4a4546b08927bc47b6f82668a7c6cfeca492000000000ad0f8e024e3c5eb0cf9ef198518458aa6720eb729822d25fa4c7397778b7a84006","job_id":"587399639771319000","target":"dc460300","coin":"XMR","variant":1},"status":"OK"}}
May 06 04:22:36 c1n3 xmr-stak-start[6916]: [2018-05-06 04:22:36] : SEND: {"method":"submit","params":{"id":"7971d32c-bdd0-4d7b-90b0-178a2c0db392","job_id":"414059556624852000","nonce":"6bcd0900","result":"78b3af0dad73f049af2c92fcf35f52ae1ac22bb9f092a05c4786f50fe52b0000"},"id":1}
May 06 04:22:36 c1n3 xmr-stak-start[6916]: [2018-05-06 04:22:36] : RECV: {"id":1,"jsonrpc":"2.0","error":{"code":-1,"message":"Invalid job id"}}
I restarted everything later in the day, and then the invalid job IDs started again around
May 07 14:36:22 c1n3 xmr-stak-start[12103]: [2018-05-07 14:36:22] : RECV: {"jsonrpc":"2.0","method":"job","params":{"blob":"0909b6d5c2d705304483a3f34a04bb1484ac2e813da9bf2b55727612301fcb8e5f4ca783573999000000fa44ecff6791de6d69b4121317efacc99acd4a087d63b6a7a7971b8854d38629f301","job_id":"0000004cbf7bf575000","target":"c5a70000","coin":"XMR","variant":1}}
May 07 14:37:07 c1n3 xmr-stak-start[12103]: [2018-05-07 14:37:07] : SEND: {"method":"submit","params":{"id":"e69f07aa-637a-48e6-a37e-959db0f51801","job_id":"0000004cbf788f40000","nonce":"7b5f03f4","result":"00c75eaa1aa8a6ba7f70ea515e978674a36fc442a5883fe5fdee28264b9a0000"},"id":1}
May 07 14:37:07 c1n3 xmr-stak-start[12103]: [2018-05-07 14:37:07] : RECV: {"id":1,"jsonrpc":"2.0","error":{"code":-1,"message":"Invalid job id"}}
Rebased onto current dev
branch and recompiled, running on all rigs now no problems. Dwarfpool (which I have never had problems with, other than fixed high diff kind of sucks)
I should have mentioned: The above log included switching xmrig-proxy to use nicehack upstream instead of dwarfpool, around May 6 20:22.
Yesterday, I switched to have xmr-stak connect directly to nicehash, and have not had any invalid job IDs since. Just a few rejected shares for other reasons:
$ nixops ssh-for-each "journalctl -u xmr-stak.service --since='2018-05-07 22:18:22' | grep 'message'"
c1n1.> May 08 14:02:22 c1n1 xmr-stak-start[16233]: [2018-05-08 14:02:22] : RECV: {"id":1,"jsonrpc":"2.0","error":{"code":5,"message":"Invalid nonce; is miner not compatible with NiceHash?"}}
c2n0.> May 08 06:22:51 c2n0 xmr-stak-start[16189]: [2018-05-08 06:22:51] : RECV: {"id":1,"jsonrpc":"2.0","error":{"code":5,"message":"Invalid nonce; is miner not compatible with NiceHash?"}}
c2n3.> May 08 00:08:05 c2n3 xmr-stak-start[16375]: [2018-05-08 00:08:05] : RECV: {"id":1,"jsonrpc":"2.0","error":{"code":5,"message":"Invalid nonce; is miner not compatible with NiceHash?"}}
c2n3.> May 08 05:13:22 c2n3 xmr-stak-start[16375]: [2018-05-08 05:13:22] : RECV: {"id":1,"jsonrpc":"2.0","error":{"code":5,"message":"Invalid nonce; is miner not compatible with NiceHash?"}}
c2n3.> May 08 15:49:00 c2n3 xmr-stak-start[16375]: [2018-05-08 15:49:00] : RECV: {"id":1,"jsonrpc":"2.0","error":{"code":5,"message":"Invalid nonce; is miner not compatible with NiceHash?"}}
m1...> May 08 17:21:36 m1 xmr-stak-start[36271]: [2018-05-08 17:21:36] : RECV: {"id":1,"jsonrpc":"2.0","error":{"code":3,"message":"Duplicate share."}}
Something must be different when using xmrig-proxy. I thought perhaps it was NH's propensity to drop the connection, but I would have expected at least some invalid job IDs if that was all.
Could you please send the full log that I can check the recived jobs before(
Late update:
I did get the invalid job id based on the first build in this thread. It took several days (6-7 days?) before the problem occurred. I compiled the latest dev 3 days ago and haven't seen it yet, but might be too early to tell. Not using a proxy, mining directly to supportxmr.
Stopped running this as it seems it has been merged into dev
, now running that.
Seems to have trouble on initial connect sometimes but I'm not sure why. Restarting when it didn't connect makes it connect fine.
I believe I am still seeing this, but it only happens after running > 2 weeks.
If you* know how xmr-stak can be compiled it would be nice if you can test #1526 and report if you have any issues with this PR. Please do not ask how you can compile xmr-stak, this issue is only for advanced users.
If you see anything which is changed by this PR (less/more rejected shares, ...) please report it within this issue. Please add you system specs to the isse (how many CPU threads, GPU (whihc kind of GPUs), OS)
direct download: https://github.com/psychocrypt/xmr-stak/archive/fix-jobConsume.zip
The changes in #1526 should solve some possible race conditions during while the job is switch and possible deadlocks.
THX psychocrypt