JayDDee / cpuminer-opt

Optimized multi algo CPU miner
Other
774 stars 543 forks source link

Not getting blocks solo mining #245

Closed JayDDee closed 4 years ago

JayDDee commented 4 years ago

This is a follow up to #244.

There are sugnificant gaps in the block heigh of new blocks solo mining anime suggesting the miner is not getting all blocks. The block emmission rate is quite high, around 6 per minute which coincides with the height gaps and a presumed 1 minute poll.

This issue is opened to investigate:

It was also noted that blocks appeared to be submitted but no reply was ever received. It may be related or a seperate issue.

YetAnotherRussian commented 4 years ago

Polling interval is a user-defined setting in other solo miners. Examples:

1) Forknote miner - https://github.com/forknote/forknote/blob/master/src/Miner/MiningConfig.h 2) XMRig - https://github.com/xmrig/xmrig/pull/1010 (scroll down to json config - ""daemon-poll-interval": 1000")

Or it just may be fast enough (see line 1182 in cpu-miner.c in that diff):

https://github.com/hyc/ccminer-cryptonight/commit/85c8ab4c25cd2d0fe9f92813563ede81e13a74b0

I think that polling every second may leed to hashrate degradation, especially on a very slow CPUs. So a good decision is somewhere near 2-3sec default with ability to set up from cli.

Block target times may be very small on "fast chains" (30sec or even 10sec - I've seen). If the coin difficulty algo is not smart enough, periodic instamines do occur (e.g. +50% of nethash just has joined, so the future blocks inside diff window would be found very fast), so polling once in a minute is veery slow.

YetAnotherRussian commented 4 years ago

It was also noted that blocks appeared to be submitted but no reply was ever received. It may be related or a seperate issue.

Will check some coins with a working solo mode. This may be a http server part thing (coin side).

JayDDee commented 4 years ago

Once I realized longpoll wasn't being used the issue became clearer.

There are two mechanisms to signal the miner threads of new work, a set time limit or the restart_threads flag.

The time limit is currently calculated to 1 minute based on the reference hash rate. Stratum and longpoll set the restart_threads flag, which the miner threads poll every hash iteration, to abort current work.

But neither stratum nor longpoll are running to set the flag. get_work is called inline by the miner threads, but that only ocurrs after the 1 minute timeout. This make the restart_threads mechanism totaly useless.

The defined scan time is determnined by the --scantime option with a default of 5 seconds. In effect the one minute hash timeout overrides the scantime so changing that option would have no effect.

A code change will be required to use the scantime option as the miner thread's hash timeout when restart_threads is ineffective.

JayDDee commented 4 years ago

I have a fix preparing for release.

When not using stratum or longpoll the miner thread scan time limit will be based on the scantime option instead of hard coded to 60 seconds.

The default for the scantime option is 5 seconds and may be changed by the user. it is currently ignored by stratum and longpoll and will continue to be ignored.

JayDDee commented 4 years ago

v3.12.4.1 is released.

In addition to the scantime fix I also added a couple of debug logs to figure out why longpoll isn't being used. Use -D.

You don't have to end your current tests prematurely unless you don't exepect any more useful info from them.

This should peel off another layer of the onion but I suspect the lack of reply to a submitted block will not be magically fixed. Just to make sure always check for unexpected blocks in your wallet.

JayDDee commented 4 years ago

I've been investigating the missing replies to submitted blocks. They aren't actually replies because the code is all inline with no messaging interface.

The RPC function call to submit the block actually returns the result which is immediately passed to share_result. The only escape would have produce an error log but no error was reported. share_result has no escape and will produce output.

But nothing. It's as if share_result was never called.

Maybe there's something weird happening lower down but it still doesn't make sense that there were no logs at all after the submit.

JayDDee commented 4 years ago

Another curiosity is the missing mining info log. Mining info can be requested from the wallet via RPC call and is enabled by default for solo mining.

It's only disabled in one place in the code and doing so would produce an error log, but once again no such log was seen.

It's important to note whether the working version displays mining info or not.

Correction: get_mininginfo only produces a log if the block height was not previously set. The absence of the log suggests that was the case. This may be reviewed as part of improving the share stats for solo mining.

YetAnotherRussian commented 4 years ago

I got new logs. There're 2 text files inside, named by the version of cpuminer-opt.

logs.zip

3.12.4 clearly indicates all blocks were stale blocks. 3.9.6.2 sadly seems to be broken, I'll have to use older version again :(

YetAnotherRussian commented 4 years ago

I will get some logs for 3.12.4.1, to make sure there is no crash and found blocks are not stale ones.

JayDDee commented 4 years ago

The logs from 3.12.4 are very encouraging. It's good to confirm the miner discarded the hash as stale without reporting it. It should have been reported without needing debug options.

My preference would be to submit it anyway and let the wallet reject it. After doing all that work might as well take the chance.

It's also encouraging that they were stale and directly related to this issue, ie not yet another issue.

Looking forward to 3.12.4.1 results.

In v3.12.4.1 you should notice immediately more frequent hash-meter logs, every 5 seconds instead of every minute, if enabled.

JayDDee commented 4 years ago

Regarding longpoll, it looks like the longpoll thread is being created for gbt, but then not used after switching to getwork.

If that works I will likely not try to "fix" it. It seems too risky to try to anticipate a switch to getwork before actually trying gbt.

YetAnotherRussian commented 4 years ago

Here's the log from 3.12.4.1

solo_log_3_12_4_1.zip

If you Ctrl+F for "submit", you'll see that some of the blocks were stale, and some of them not. But I see a strange response after these blocks got submitted. And they were not actually accepted (I got no reward).

Then I've started v3.9.4, here's the log:

solo_log_3_9_4.zip

When I checked it, I've seen this: [2020-02-23 18:06:24] DEBUG: hash <= target Hash: 0000000112663fe8a19a5f02728be0f7c124fbaf5e33a760998cf6467f68e494 Target: 00000001c0020000000000000000000000000000000000000000000000000000 [2020-02-23 18:06:24] Share 6 submitted by thread 6, lane 0.

Several these entries there, and these blocks got accepted. Proof (you may compare timings with the log, if you want to): k

So, with 3.12.4.1 we do not crash, ask for jobs very fast, but... Do mine stale blocks.

I think jobs are not actually refreshing somewhere. We ask, recieve new, but mine old data?

JayDDee commented 4 years ago

Sometimes a new block is received after the "submitted" log but before the "stale work detected" log, a simple race condition. The stale log is displayed after the new block.

The test itself is pretty straightforward testing work vs g_work. It shouldn't be too hard to figure out what's going on.

JayDDee commented 4 years ago

I think I found the bug where the miner thread fails to copy the global g_work to it's local work. It's still about job ids.

A very simple code change will confirm it.

cpu-miner.c:std_get_new_work

if ( have stratum ) ... else force_new_work = true; // changed from false

YetAnotherRussian commented 4 years ago

Sometimes a new block is received after the "submitted" log but before the "stale work detected" log, a simple race condition. The stale log is displayed after the new block.

Simple lock for console output may help, but ofc using this lock only here is a bad idea.

A very simple code change will confirm it.

That helped!

     **********  cpuminer-opt 3.12.4.1  *********** 
 A CPU miner with multi algo support and optimized for CPUs
 with AVX512, SHA and VAES extensions.
 BTC donation address: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT

CPU: AMD Ryzen 9 3900X 12-Core Processor . SW built on Feb 23 2020 with GCC 9.2.1. CPU features: AVX2 AES SHA SW features: AVX2 AES SHA Algo features: AVX512 VAES

Starting miner with AVX2 AES...

[2020-02-23 08:36:47] Extranonce subscribe: YES [2020-02-23 08:36:47] Creating long poll thread [2020-02-23 08:36:47] Creating stratum thread [2020-02-23 08:36:47] Binding thread 0 to cpu 0. [2020-02-23 08:36:47] Binding thread 1 to cpu 1. [2020-02-23 08:36:47] Binding thread 2 to cpu 2. [2020-02-23 08:36:47] Binding thread 3 to cpu 3. [2020-02-23 08:36:47] Binding thread 4 to cpu 4. [2020-02-23 08:36:47] Binding thread 5 to cpu 5. [2020-02-23 08:36:47] Binding thread 6 to cpu 6. [2020-02-23 08:36:47] Binding thread 7 to cpu 7. [2020-02-23 08:36:47] Binding thread 8 to cpu 8. [2020-02-23 08:36:47] Binding thread 9 to cpu 9. [2020-02-23 08:36:47] Binding thread 10 to cpu 10. [2020-02-23 08:36:47] 12 miner threads started, using 'anime' algorithm. [2020-02-23 08:36:47] Binding thread 11 to cpu 11. [2020-02-23 08:36:48] Current block is 6259469 [2020-02-23 08:36:48] Switching to getwork, gbt version 112 [2020-02-23 08:36:50] CPU temp: curr 0 C (max 0), Freq: -0.000/0.000 GHz [2020-02-23 08:37:19] anime block 6259470, diff 149.33, net 84.01 MH/s [2020-02-23 08:37:54] anime block 6259471, diff 149.33, net 83.68 MH/s [2020-02-23 08:38:01] anime block 6259472, diff 149.33, net 82.93 MH/s [2020-02-23 08:38:22] anime block 6259473, diff 149.33, net 83.56 MH/s [2020-02-23 08:38:50] anime block 6259474, diff 149.33, net 85.78 MH/s [2020-02-23 08:38:55] CPU temp: curr 0 C (max 0), Freq: -0.000/0.000 GHz [2020-02-23 08:39:00] 1 submitted by thread 6, lane 3 [2020-02-23 08:39:02] 1 Accepted 1 S0 R0 B0, 132.860 sec (2085ms) Diff 8.3295 (5.58%), Block 0 [2020-02-23 08:39:04] anime block 6259475, diff 149.33, net 85.96 MH/s [2020-02-23 08:40:16] anime block 6259476, diff 149.33, net 84.81 MH/s [2020-02-23 08:40:30] anime block 6259478, diff 149.33, net 86.50 MH/s [2020-02-23 08:41:00] CPU temp: curr 0 C (max 0), Freq: -0.000/0.000 GHz [2020-02-23 08:41:12] anime block 6259479, diff 149.33, net 86.44 MH/s [2020-02-23 08:41:40] anime block 6259480, diff 149.33, net 86.43 MH/s [2020-02-23 08:43:01] CPU temp: curr 0 C (max 0), Freq: -0.000/0.000 GHz [2020-02-23 08:43:11] anime block 6259481, diff 149.33, net 86.16 MH/s [2020-02-23 08:43:25] anime block 6259482, diff 149.33, net 86.17 MH/s [2020-02-23 08:43:53] anime block 6259483, diff 149.33, net 86.22 MH/s [2020-02-23 08:44:56] anime block 6259485, diff 149.33, net 85.42 MH/s [2020-02-23 08:45:08] CPU temp: curr 0 C (max 0), Freq: -0.000/0.000 GHz [2020-02-23 08:47:11] CPU temp: curr 0 C (max 0), Freq: -0.000/0.000 GHz [2020-02-23 08:47:16] anime block 6259486, diff 149.33, net 82.84 MH/s [2020-02-23 08:47:44] anime block 6259487, diff 149.33, net 83.18 MH/s [2020-02-23 08:47:58] anime block 6259488, diff 149.33, net 83.15 MH/s [2020-02-23 08:48:12] anime block 6259489, diff 149.33, net 83.16 MH/s [2020-02-23 08:48:19] anime block 6259490, diff 149.33, net 83.16 MH/s

Now we have an indication on every block received, and report that block was accepted (I've checked, it was).

JayDDee commented 4 years ago

Great news. The only thing left is to tweak the patch for a proper fix. Then I can deal with the stats issues in 246.

Thanks for all your hard work. It's the first thorough test of solo mining.

JayDDee commented 4 years ago

cpuminer-opt-3.12.4.2 released with a fix for stale shares. Please retest and report any problems. Any problems with stats or suggestions for improvement can be made in #246 .

YetAnotherRussian commented 4 years ago

Latest release has a stable crash (any build) in several seconds:

     **********  cpuminer-opt 3.12.4.2  ***********
 A CPU miner with multi algo support and optimized for CPUs
 with AVX512, SHA and VAES extensions.
 BTC donation address: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT

CPU: AMD Ryzen 9 3900X 12-Core Processor . SW built on Feb 23 2020 with GCC 7.3.0. CPU features: AVX2 AES SHA SW features: SSE2 Algo features: AVX512 VAES

Starting miner with SSE2...

[2020-02-24 09:46:09] 24 CPU cores available, 12 miner threads selected. [2020-02-24 09:46:09] Extranonce subscribe: YES [2020-02-24 09:46:09] Creating long poll thread [2020-02-24 09:46:09] Creating stratum thread [2020-02-24 09:46:09] Binding thread 0 to mask 0000000000555555 [2020-02-24 09:46:09] JSON protocol request: {"method": "getblocktemplate", "params": [{"capabilities": ["coinbasetxn", "coinbasevalue", "longpoll", "workid"]}], "id":0}

[2020-02-24 09:46:09] Binding thread 2 to mask 0000000000555555 [2020-02-24 09:46:09] Binding thread 3 to mask 0000000000555555 [2020-02-24 09:46:09] Binding thread 4 to mask 0000000000555555 [2020-02-24 09:46:09] Binding thread 5 to mask 0000000000555555 [2020-02-24 09:46:09] Binding thread 6 to mask 0000000000555555 [2020-02-24 09:46:09] Binding thread 7 to mask 0000000000555555 [2020-02-24 09:46:09] Binding thread 8 to mask 0000000000555555 [2020-02-24 09:46:09] Binding thread 9 to mask 0000000000555555 [2020-02-24 09:46:09] Binding thread 10 to mask 0000000000555555 [2020-02-24 09:46:09] 12 miner threads started, using 'anime' algorithm. [2020-02-24 09:46:09] Binding thread 11 to mask 0000000000555555

C:\cpuminer-opt-3.12.4.2-windows>pause

YetAnotherRussian commented 4 years ago

Debug info:

Thread 2 "cpuminer" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff658d700 (LWP 11310)]
__strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:65
65  ../sysdeps/x86_64/multiarch/strlen-avx2.S: No such file or directory.
(gdb) bt
#0  __strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:65
#1  0x00007ffff76edde5 in __vfprintf_internal (
    s=0x7ffff785d6a0 <_IO_2_1_stdout_>, 
    format=0x7ffff000fcb0 "[2020-02-23 23:00:05]\033[36m %s %s block %d, diff %.5g\033[0m\n", ap=0x7ffff636d298, mode_flags=<optimized out>)
    at vfprintf-internal.c:1688
#2  0x0000555555565e47 in ?? ()
#3  0x00005555555603ca in ?? ()
#4  0x00007ffff7f01669 in start_thread (arg=<optimized out>)
    at pthread_create.c:479
#5  0x00007ffff7794323 in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
JayDDee commented 4 years ago

Oh crap, that's the format string for the new block log for getwork. I messed it up pretty good, forgot a couple of arguments.

cpu-miner.c:1491

applog( LOG_BLUE, "%s %s block %d, diff %.5g", work->height, net_diff );

should be

applog( LOG_BLUE, "%s %s block %d, diff %.5g", algo_names[opt_algo], short_url, work->height, net_diff );

JayDDee commented 4 years ago

cpuminer-opt-3.12.4.3 is released fixing the segfault.

YetAnotherRussian commented 4 years ago

[2020-02-25 11:12:34] 2 submitted by thread 3, lane 2 [2020-02-25 11:12:34] Hash[7:0]: 00000002 1f50715e ed03d2e0 4cd1e186 8a73febf 30b45c19 f3909e2a 35dd877f [2020-02-25 11:12:34] Targ[7:0]: 00000002 9b060000 00000000 00000000 00000000 00000000 00000000 00000000 [2020-02-25 11:12:35] 2 Accepted 1 S0 R1 B0, 470.748 sec (1081ms) Diff 0.47118 (0.48), Block 0

It seems to be OK for now, no crash, this one may be closed. I'll add a separate issue for idling.

JayDDee commented 4 years ago

I'm not ready to close it yet I want to verify the coreectness of the logs. I see some problems already, block number 0, the diff isn't right either. And that's only the share result log.

The new getwork block is a new log, I'd like to see it in action. The first block was rejected also, that deserves a closer look.

YetAnotherRussian commented 4 years ago

I've told about block number 0 here - https://github.com/JayDDee/cpuminer-opt/issues/246#issuecomment-590737477

The first block was rejected also, that deserves a closer look.

That seems to be normal. I see no response @ server side for this (reject reason, like pools do).

JayDDee commented 4 years ago

I see nothing about rejecting the first block in #246. You'll have to tell me again. I don't think that should be normal.

YetAnotherRussian commented 4 years ago

Stale block. Debug (--debug) was disabled.

JayDDee commented 4 years ago

You're right it is a normal stale but not reported as stale. It might be a getwork issue, the test for stale is a string search of the reject reason. I will try to find a better way. That is an issue for #246. This one can be closed.

JayDDee commented 4 years ago

Blocks are now being accepted using getwork so this isue can be closed.