fireice-uk / xmr-stak

Free Monero RandomX Miner and unified CryptoNight miner
GNU General Public License v3.0
4.05k stars 1.79k forks source link

Help needed for testing [only advanced users please] #1527

Open psychocrypt opened 6 years ago

psychocrypt commented 6 years ago

If you* know how xmr-stak can be compiled it would be nice if you can test #1526 and report if you have any issues with this PR. Please do not ask how you can compile xmr-stak, this issue is only for advanced users.

If you see anything which is changed by this PR (less/more rejected shares, ...) please report it within this issue. Please add you system specs to the isse (how many CPU threads, GPU (whihc kind of GPUs), OS)

direct download: https://github.com/psychocrypt/xmr-stak/archive/fix-jobConsume.zip

The changes in #1526 should solve some possible race conditions during while the job is switch and possible deadlocks.

THX psychocrypt

Spudz76 commented 6 years ago

Running this patch now on most of mine although I had no issues before (nanopool) Should prove it doesn't cause any regressions. Mix of cpus, amds, and nvidias, and mix of compilers/distros/drivers, too.

Amdpowered commented 6 years ago

Compiled and running as well. No issues so far. But the error only happened once before this fix.

Edit 5/2/18 - 12:21UTC I should note I'm mining directly to a pool and not using nicehash. No issues for 12 hours.

aij commented 6 years ago

I'm running b24cb3a24a8a5d72719f061bef2b3316fa4ebe15 on 12 machines now. I've been seeing around 1 to 5 instances of #1505 daily.

Spudz76 commented 6 years ago

This patch works identical to previous for me (meaning, no issues caused). Installed on every rig now.

I get bad results occasionally (100/day) but believe it's from overclock or just old cards being old. And quite a few expired/late results but that's a pool fixed-diff too high thing. Still well under 1% and the exact same as earlier versions.

aij commented 6 years ago

Just got back home to find 3 machines doing useless work again. Looks the same as before, but let me know if there's more I should be looking at.

xmr-stak-b24cb3a2

Spudz76 commented 6 years ago

I think that looks more like your Internet choked on itself and then the pool got permanently confused about your re-login for ~11 hours following.

aij commented 6 years ago

@Spudz76 The tcpdump I previously uploaded made it pretty clear it's not the pool that got confused. How does this look different to you?

psychocrypt commented 6 years ago

@aij could you please download the branche https://github.com/psychocrypt/xmr-stak/archive/topic-consumeJobWithOutput.zip compile and run it. This PR will create a lot of debug output. It would be nice if you can set a log file in config.txt and provide use with the log as soon as you reproduce the rejected issue. Hopefully this will give us enough information to detect the source of the issue.

kaeptnb commented 6 years ago

Hi! i tested this release because of issue #1519 - it helps cutting down the invalid shares from about 4-5% to 2% (but let me test this longer to be sure)

borzaka commented 6 years ago

No issues so far;

But I haven't got any issues with the prev releases as well. There were no invalid/rejected shares, maybe some network error, but that's it.

Edits: updated Good results and interval.

aij commented 6 years ago

@psychocrypt The rwlock changes seem to have solved the original problem I was seeing, but now after stating off well, I'm getting a lot of low difficulty or duplicate shares. Xmrig-proxy seems to be submitting them upstream, where they get rejected by dwarfpool.

xmr-stak-invalid

Actually, that seems to be persisting even after restarting xmr-stak (and xmrig-proxy), so it may be a problem on dwarfpool.

Just in case, here are logs and configs from that machine (c2n1). xmr-stak-invalid.tar.gz

psychocrypt commented 6 years ago

@aij Thanks for the log, this log is very useful. Never the less you you try to mine against an other pool if you than end up with the same issues.

aij commented 6 years ago

@psychocrypt It looks like #1505 is still happening, though perhaps less quickly?

xmr-stak-invalid-job-id.txt.gz

Apparently I missed this happening yesterday:

May 06 04:21:20 c1n3 xmr-stak-start[6916]: [2018-05-06 04:21:20] : RECV: {"id":1,"jsonrpc":"2.0","result":{"id":"7971d32c-bdd0-4d7b-90b0-178a2c0db392","job":{"blob":"0707bf91bbd705b26d29e1d07c5cf101144163f3e4a4546b08927bc47b6f82668a7c6cfeca492000000000ad0f8e024e3c5eb0cf9ef198518458aa6720eb729822d25fa4c7397778b7a84006","job_id":"587399639771319000","target":"dc460300","coin":"XMR","variant":1},"status":"OK"}}
May 06 04:22:36 c1n3 xmr-stak-start[6916]: [2018-05-06 04:22:36] : SEND: {"method":"submit","params":{"id":"7971d32c-bdd0-4d7b-90b0-178a2c0db392","job_id":"414059556624852000","nonce":"6bcd0900","result":"78b3af0dad73f049af2c92fcf35f52ae1ac22bb9f092a05c4786f50fe52b0000"},"id":1}
May 06 04:22:36 c1n3 xmr-stak-start[6916]: [2018-05-06 04:22:36] : RECV: {"id":1,"jsonrpc":"2.0","error":{"code":-1,"message":"Invalid job id"}}

I restarted everything later in the day, and then the invalid job IDs started again around

May 07 14:36:22 c1n3 xmr-stak-start[12103]: [2018-05-07 14:36:22] : RECV: {"jsonrpc":"2.0","method":"job","params":{"blob":"0909b6d5c2d705304483a3f34a04bb1484ac2e813da9bf2b55727612301fcb8e5f4ca783573999000000fa44ecff6791de6d69b4121317efacc99acd4a087d63b6a7a7971b8854d38629f301","job_id":"0000004cbf7bf575000","target":"c5a70000","coin":"XMR","variant":1}}
May 07 14:37:07 c1n3 xmr-stak-start[12103]: [2018-05-07 14:37:07] : SEND: {"method":"submit","params":{"id":"e69f07aa-637a-48e6-a37e-959db0f51801","job_id":"0000004cbf788f40000","nonce":"7b5f03f4","result":"00c75eaa1aa8a6ba7f70ea515e978674a36fc442a5883fe5fdee28264b9a0000"},"id":1}
May 07 14:37:07 c1n3 xmr-stak-start[12103]: [2018-05-07 14:37:07] : RECV: {"id":1,"jsonrpc":"2.0","error":{"code":-1,"message":"Invalid job id"}}
Spudz76 commented 6 years ago

Rebased onto current dev branch and recompiled, running on all rigs now no problems. Dwarfpool (which I have never had problems with, other than fixed high diff kind of sucks)

aij commented 6 years ago

I should have mentioned: The above log included switching xmrig-proxy to use nicehack upstream instead of dwarfpool, around May 6 20:22.

Yesterday, I switched to have xmr-stak connect directly to nicehash, and have not had any invalid job IDs since. Just a few rejected shares for other reasons:

$ nixops ssh-for-each "journalctl -u xmr-stak.service --since='2018-05-07 22:18:22' | grep 'message'"
c1n1.> May 08 14:02:22 c1n1 xmr-stak-start[16233]: [2018-05-08 14:02:22] : RECV: {"id":1,"jsonrpc":"2.0","error":{"code":5,"message":"Invalid nonce; is miner not compatible with NiceHash?"}}
c2n0.> May 08 06:22:51 c2n0 xmr-stak-start[16189]: [2018-05-08 06:22:51] : RECV: {"id":1,"jsonrpc":"2.0","error":{"code":5,"message":"Invalid nonce; is miner not compatible with NiceHash?"}}
c2n3.> May 08 00:08:05 c2n3 xmr-stak-start[16375]: [2018-05-08 00:08:05] : RECV: {"id":1,"jsonrpc":"2.0","error":{"code":5,"message":"Invalid nonce; is miner not compatible with NiceHash?"}}
c2n3.> May 08 05:13:22 c2n3 xmr-stak-start[16375]: [2018-05-08 05:13:22] : RECV: {"id":1,"jsonrpc":"2.0","error":{"code":5,"message":"Invalid nonce; is miner not compatible with NiceHash?"}}
c2n3.> May 08 15:49:00 c2n3 xmr-stak-start[16375]: [2018-05-08 15:49:00] : RECV: {"id":1,"jsonrpc":"2.0","error":{"code":5,"message":"Invalid nonce; is miner not compatible with NiceHash?"}}
m1...> May 08 17:21:36 m1 xmr-stak-start[36271]: [2018-05-08 17:21:36] : RECV: {"id":1,"jsonrpc":"2.0","error":{"code":3,"message":"Duplicate share."}}

Something must be different when using xmrig-proxy. I thought perhaps it was NH's propensity to drop the connection, but I would have expected at least some invalid job IDs if that was all.

psychocrypt commented 6 years ago

Could you please send the full log that I can check the recived jobs before(

Amdpowered commented 6 years ago

Late update:

I did get the invalid job id based on the first build in this thread. It took several days (6-7 days?) before the problem occurred. I compiled the latest dev 3 days ago and haven't seen it yet, but might be too early to tell. Not using a proxy, mining directly to supportxmr.

Spudz76 commented 6 years ago

Stopped running this as it seems it has been merged into dev, now running that. Seems to have trouble on initial connect sometimes but I'm not sure why. Restarting when it didn't connect makes it connect fine.

Amdpowered commented 6 years ago

I believe I am still seeing this, but it only happens after running > 2 weeks.