Share stats for solo mining

JayDDee commented 4 years ago

This is yet another follow up to #244 & #245 to support share statistics for solo mining.

For the most part the share statistics feature works for solo mining with the limitation that "share" related data has no meaning when solo mining. A share essentially represents a solved block so all "share" related statistics will actually be block statistics.

Some of this info may be redundant with explicit block level stats or the share stats may be all zero.

Though acceptble this is not ideal. An attempt will be made to suppress irrelevant info in the logs.

The only significant work item is to implement a new block log for solo mining. The existing block log is generated in stratum code that is not used when solo mininig.

The new block log for solo mining will have a different format that it's stratum cousin and will display different info. There is no stratum diff and no need for share estimates. This will resuilt in a shorter log than the stratum version.

A proposed format:

[algo]: [URL] block [work->height], diff [net_diff] TTF @ [refhashrate] [time], net hashrate [net_hash]

All fields have the same meaning as the block related fields of the stratum new block log and will be displayed in the same format

This log will be output for both getwork and GBT excluding longpoll. Longpoll will be implemented when it can be tested.

JayDDee commented 4 years ago

The summary report will need to be implemented for solo minig as it is currently generated in stratum code. It appears it will need to be done by one of the mining threads so care will be required to avoid race conditions.

No changes are anticipated to the report fiields or format.

JayDDee commented 4 years ago

Another problem was discovered durting tyesting of v3.12.4, the miner will silently discard has it believes is stale without reporting it. This is because the actual RPC submission is done by the workio thread instead of the miner thread . The miner thread reports the share submitted because it sent the request to the workio thread. But the workio thread discarded the share and never actually submitted it, hence no result.

JayDDee commented 4 years ago

The next release will have the new block and summary logs implemented for getwork. Minor tweaking will follow to clean up any invalid share stats, address the silent stales, and any other improvements that become apparent.

Summary log was implemented in workoi_get_work which is called by the workio thread only for getwork. This avoids having the miners threads do it without tripping over each other.

New block log is implemented in get_upstream_work for the same reasons.

The format of the summary log is unchanged for now, the new block log is slightly modified from the stratum version due to lack of stratum diff and job ids. Unlike the stratum version the network hash rate is actually obtained directly from the wallet instead of being calculated by counting new blocks over time.

[algo] [URL] block {height], diff [net_diff] Miner TTF @ [miner_ref_hashrate] [time], net TTF @ [net_hash_rate] [time]

JayDDee commented 4 years ago

cpuminer-opt-3.12.4.2 is released with initial implementation of block and summary logs for getwork. Refinements to stats will be made in subsequent releases to ensure correctness, suppress irrelevant info and avoid redundancy.

User feedback is welcome.

JayDDee commented 4 years ago

One of the follow up issues is the silent discard of stale blocks. The first problem is a log is only produced if debug is enabled. That will be changed.

Anotther problem is there is no mechanism to communicate the error back from submit_upstream_work of the workio thread to the miner thread that found the block.

These 2 problems combined create the silent stale scenario. Always producing the discard log will close the loop.

Another more phylosophical question is whther the block should be discarded or submitted anyway. The penalty is trivial, just a likely futile test of the block by the wallet. However there is also the benefit of properly recording the stale block to be included in the stats.

There are actually 2 tests affected: the stale work test which was seen in recent testing, and a stale block test (block already solved). Each will be evaluated seperately whether the error should be ignored and the block submitted anyway.

I'm leaning toward submitting just in case it might be accepted accepted. after all te work to find a block it seems a waste to summarilly discard it. It also make the stats more complete.

That's 2 reasons to submit. Comments welcome.

Edit:

Correction: stratum performs the stale work test but not the stale block test because stratum doesn't support mininginfo.

There's a third reason, [stratum has no similar tests]. The tests by getwork are 1, a comparison of g_work (global) and work (local), and 2, an RPC query for mining info to check the solved block is still current. They are done by workoi thread immediately before submitting the block to rhe wallet.

On its surface this seems like the right thing to do, and in most cases it would be. Preventing errors is good defensive coding and given the existance of a race condition, the last minute test reduces the size of the window of exposure. It all sounds good.

As previously stated allowing the likely stale block to be submitted makes the stats record the event and include it in performance reports.

I'm not quit eready to make the final decision but I haven't found a practical reason to perform the tests.

JayDDee commented 4 years ago

Just after stating stratum didn't perform the last muinute stale test, I didi some testing I found out otherwise.

Scryptn2 at zergpool has a chronic stale share problem. The logs show a share submitted for the old job seconds after the new job report.

Stratum receives the job, signals the miner threads to abort their current work and get new work, ands logs the event. There is some latency before the miner threads react, they will finish their current hash, possibly a valid ine and submit it, then check the abor. flag.

The worst case latency is the time it takes for one thread to calculate one hash. A slow algo like scryptn2 has an unusually long hash time but it is still under a second per thread. Submitting several seconds late can't be explained.

Another interesting observation was made. The silent discard explains why the share stats get out of sync when mining scryptn2. The submit count gets incremented but the reply count does not, implying an unreplied share.

Disabling thesoilent discard would help keep the stats in sync as well as provide a more accurate stale share count.

Unfortunately it doesn't help with the underlying chronic stale share problem, although the slow hash rate is highly suspected of being a factor.

JayDDee commented 4 years ago

The chronic stale share problem with scryptn2 is now understood.

The scrypt code does up to 24 way hashing therefore the time to calculate one hash is the same as for 24. This results in a hash time that could be several seconds.

This is the result of the extremely slow hashing of scryptn2 combined with the design of the scrypt hashing code.

I intend no further follow up on the scryptn2 stale share problem.

JayDDee commented 4 years ago

cpuminer-opt-3.12.4.3 is released. The silent discarding of suspected stale shares has been disabled for both getwork and stratum.

It was found to cause share count mismatches in share stats that resulted in invalid data in the logs and inaccurate performance measurements.

By submitting tthe suspected stale shares, and letting the server reject them, ensures the counters remain synchronized and stale shares are properly accounted in performance statistics.

One more release is possible before this issue is closed to allow for final tweaking after user feedback.

YetAnotherRussian commented 4 years ago

cpuminer-opt 3.12.4.3 ... Starting miner with AVX2 AES... [2020-02-25 10:59:49] 24 CPU cores available, 12 miner threads selected. [2020-02-25 10:59:49] Extranonce subscribe: YES [2020-02-25 10:59:49] 12 miner threads started, using 'anime' algorithm. [2020-02-25 10:59:50] Current block is 6264277 [2020-02-25 10:59:50] Switching to getwork, gbt version 112 [2020-02-25 10:59:52] anime 127.0.0.1:25555 block 6264277, diff 98.25 [2020-02-25 11:00:02] anime 127.0.0.1:25555 block 6264277, diff 98.25 Miner TTF @ 3778.68 kh/s 1d07h, net TTF @ 63.65 Mh/s 1h50m [2020-02-25 11:00:09] anime block 6264278, diff 98.25, net 62.74 MH/s [2020-02-25 11:00:09] anime 127.0.0.1:25555 block 6264278, diff 98.25 Miner TTF @ 3767.10 kh/s 1d07h, net TTF @ 62.74 Mh/s 1h52m [2020-02-25 11:00:16] anime block 6264279, diff 98.25, net 62.81 MH/s [2020-02-25 11:00:16] anime 127.0.0.1:25555 block 6264279, diff 98.25 Miner TTF @ 3770.18 kh/s 1d07h, net TTF @ 62.81 Mh/s 1h51m [2020-02-25 11:00:23] anime block 6264280, diff 98.25, net 62.93 MH/s [2020-02-25 11:00:23] anime 127.0.0.1:25555 block 6264280, diff 98.25 Miner TTF @ 3772.33 kh/s 1d07h, net TTF @ 62.93 Mh/s 1h51m [2020-02-25 11:00:30] anime 127.0.0.1:25555 block 6264280, diff 98.25 Miner TTF @ 3772.61 kh/s 1d07h, net TTF @ 62.93 Mh/s 1h51m [2020-02-25 11:00:37] anime 127.0.0.1:25555 block 6264280, diff 98.25 Miner TTF @ 3770.68 kh/s 1d07h, net TTF @ 62.93 Mh/s 1h51m [2020-02-25 11:00:44] anime 127.0.0.1:25555 block 6264280, diff 98.25 Miner TTF @ 3772.97 kh/s 1d07h, net TTF @ 62.93 Mh/s 1h51m [2020-02-25 11:00:51] anime 127.0.0.1:25555 block 6264280, diff 98.25

1) diff seems to be incorrect (and NET TTF of course, too): 1d solo/2h network - these are 100% incorrect with such a small block target time 2) do we really need hostname and IP over there? 3) As seen over here: [2020-02-25 11:12:34] 2 submitted by thread 3, lane 2 [2020-02-25 11:12:34] Hash[7:0]: 00000002 1f50715e ed03d2e0 4cd1e186 8a73febf 30b45c19 f3909e2a 35dd877f [2020-02-25 11:12:34] Targ[7:0]: 00000002 9b060000 00000000 00000000 00000000 00000000 00000000 00000000 [2020-02-25 11:12:35] 2 Accepted 1 S0 R1 B0, 470.748 sec (1081ms) Diff 0.47118 (0.48), Block 0

we try to show block height of a found block. I think that info should be removed, as there is no info on block height at that moment. 4) 5min hashrates are always nulls: [2020-02-25 11:46:10] Periodic Report 5m05s 46m20s Share rate 0.00/min 0.09/min Hash rate 0.00h/s 1245.41kh/s (3759.11kh/s) Lost hash rate 0.00h/s 0.00h/s Submitted 0 4 Accepted 0 2 Rejected 0 2 5) "Lost hash rate" - this is not applicable to solo mining and should be removed, I think. Solo is a lottery (even TTF is a rough estimate). Better is to replace "Share rate" -> "Block rate" in case of !stratum

JayDDee commented 4 years ago

I was already thinking of moving that to the summary log.
lost hash rate is the hashrate equivalent of rejected and stale shares. which isn't included in the effective hash rate (what is actually earned). When comparing performance the statistocal mean is: ref HR == effective HR + lost HR

Of course if there are no rejects the lost hashrate should be zero and therefore not displayed.

I see a couple of other issues:

Duplicate new block logs, it looks like each thread reports it, easy fix. It looks like the TTF is reversed between network block TTF and your miner's block TTF. Can you confirm the numbers are otherwise correct.?

If the data isn't correct in the block log the error will likely propagate to the share and summary as well, so I'l focus on block log frst before worrying too much about the others.

YetAnotherRussian commented 4 years ago

It looks like the TTF is reversed between network block TTF and your miner's block TTF. Can you confirm the numbers are otherwise correct.?

Nope. Real diff level and block times are shown in another miner and @ pool side. No way it's a day or an hour. There're some protocol dump logs in another issue, as you remeber, maybe they could make the real diff level clearer.

Duplicate new block logs, it looks like each thread reports it, easy fix.

Yes, that is present in linux, too.

JayDDee commented 4 years ago

In the new block log the diff and height should be correct. If not we have a big problem. The miner hashrate used to calculate your TTF should also be correct after a few samples.

The net hashrate is probably messed up. It's a global but I treat it like a local and overwrite it when it's scaled it for display. Corrupting the net hashrate will have a trickle effect on all info that depends on it.

That limits the value of future data collection until it's fixed. It's an easy fix if you're up to it. Let me know if not, I can push out a release with the fix.

Define local net_hr and use it instead ot net_hashrate:

cpu-miner.c:1427

  if ( miner_hr )
  {
     double net_hr = net_hashrate;  // NEW
     char net_hr_units[4] = {0};
     char miner_hr_units[4] = {0};
     char net_ttf[32];
     char miner_ttf[32];

     sprintf_et( net_ttf, net_diff * diff_to_hash / net_hr );
     sprintf_et( miner_ttf, net_diff * diff_to_hash / miner_hr );
     scale_hash_for_display ( &miner_hr, miner_hr_units );
     scale_hash_for_display ( &net_hr, net_hr_units );
     applog2(LOG_INFO, "Miner TTF @ %.2f %sh/s %s, net TTF @ %.2f %sh/s %s",
                         miner_hr, miner_hr_units, miner_ttf,
                         net_hr, net_hr_units, net_ttf );
  }

JayDDee commented 4 years ago

There is an interest quirk in the stats for solo mining. The target diff and net diff should be the same but in the stats they are sourced differently.

The net_diff is provided by RPC mininginfo, but the target diff is calculated by the miner from the target hash. It makes for an interesting comparison.

Regarding point 4: 5 min hashrate always zero. This is the effective hash rate based on submitted shares. If no shares were submitted the hashrate is zero. It's correct.

I think I've gone as far as possible with the available data. I'll wait a while longer to see if any more is forthcoming, then I'll release a new version with a few fixes.

JayDDee commented 4 years ago

There seems to be an issue with reporting stale shares, they get reported as rejected instead of stale.

JayDDee commented 4 years ago

v3.12.4.4 has some fixes to getwork stats. Please test.

JayDDee commented 4 years ago

One other note is that every "Accepted" should be a "BLOCK SOLVED" when solo mining.. The reason it isn't is because the share diff is so low, it shoulld be >= net diff. The share ratio should also be >= 100%. It's weird that it's the same as the diff.

YetAnotherRussian commented 4 years ago

v3.12.4.4 has some fixes to getwork stats. Please test.

I'm on it.

JayDDee commented 4 years ago

I just noticed share diff is incorrect in the share result log for stratum. It should always be >= targetdiff. This will also affect the share ratio. Test results of these fields will be invalid until fixed. Share diff should not affect any other stats. This appears to be a day1 bug with share stats.

Edit: I have a fix for the incorrect share diff ready to go. I'll wait for your test report first in case you find something else that needs fixing.

I hope the next release is the last test release before declaring victory over this issue.

YetAnotherRussian commented 4 years ago

Seems that ther're still some reports on getting job for same block height

JayDDee commented 4 years ago

Just released v3.12.4.5 with some more fixes. I'll take a look at your logs fro 3.12.4.4.

Edit: Repeated block logs for the same block. I think I know what that is, probably just new work, not a new block. Should be easy to fix.

What's you're opinion of new work? should it be logged like stratum logs new jobs? Or is it too much noise? It seems a little too frequent, I'll wait to see what the problem is.

How much are the TTF estimates off by? How long does it usually take for you to get a block. How often does the network issue a new block?

YetAnotherRussian commented 4 years ago

I'm able to solve pretty fast. I've just seen your new release, and already have new logs from it:

log_9.txt

There's something here: [2020-02-28 11:05:45] 1 Submit diff 0.2612, block 6273249 [2020-02-28 11:05:46] Block 6273249 already solved, current block 124651581 [2020-02-28 11:05:47] 1 Accepted 1 S0 R0 B0, 717.115 sec (2043ms) Diff 0.2612 (0.22), Block 0

That was not expected :D

YetAnotherRussian commented 4 years ago

What's you're opinion of new work? should it be logged like stratum logs new jobs? Or is it too much noise? It seems a little too frequent, I'll wait to see what the problem is.

I think we should log new block number only once.

How much are the TTF estimates off by? How long does it usually take for you to get a block.

My real estimate is 4-5 blocks in 30min @ ~75Mh/s nethash

How often does the network issue a new block?

Explorer #1: https://animeco.in/explorer Network: 67.61 MH/s Difficulty: 118.48 Avg. Block Time: 30.68 seconds Explorer #2: https://miningbase.tk/explorer/ANI

YetAnotherRussian commented 4 years ago

Sorry, it seems that cpuminer-opt displays correct diff, e.g. "New block 6273277, diff 118.48", only NET TTF and miner TTF values are not correct.

JayDDee commented 4 years ago

That's good data. It looks like a false positive for the stale block test, probably because of invalid current block height.

I'll double check the TTF calculations, the miner TTF should be the same as stratum, the net TTF is reversed.

For getwork the net hashrate is provided via RPC and the TTF is calculated.

For stratum the net TTF is calculated by counting the new blocks over time and converting to a hash rate. I have not been able to verify the net hashrate independently. I suspect it's not correct.

However, the share TTF and block TTF for stratum are correct so it's just a matter of doing the same thing for getwork.

Block is still reported as "Accepted" not "BLOCK SOLVED", need to look into that too.

There's a lot more to look at. Let me know of other interesting stuff you find.

YetAnotherRussian commented 4 years ago

And this is the real and correct TTF (NET TTF):

So, the miner TTF in case of 4Mh/s shoud be around 8min or something. This confirms my stats (4-5 blocks in 30min @ ~75Mh/s nethash - written above).

Let me know of other interesting stuff you find.

Yep, I will. Currently switched to v3.12.4.5.

JayDDee commented 4 years ago

About logging new block, part of the issue is distinguishing a new block from simply new work. It should just say "New work" like stratum's "New job".

Usually when solo mining the output is pretty sparse and the new work logs can provide a heratbeat to reassure the user that things are working. In this case the coin has a high block emmision rate so I shouln't make the decision based on 1 coin. A coin with a 5 minute block rate would be more appropriate as a guide.

YetAnotherRussian commented 4 years ago

the new work logs can provide a heartbeat to reassure the user that things are working

Previously, it was a per-thread hashrate. But yes, thread count is increasing every year, so it is totally inacceptable to use such an output as a heartbeat in case of Threadripper 3990X or similar systems... Too much new lines.

It should just say "New work" like stratum's "New job".

Seems to be a good solution.

In this case the coin has a high block emmision rate so I shouln't make the decision based on 1 coin. A coin with a 5 minute block rate would be more appropriate as a guide.

Definitely we should not base on one coin. But the current tendency is to reduce block times (1-3 minutes or less). The goal is to have fast transactions while maintaining reasonable count of block confirmations. So about 5 minutes... this is too much, and may become an edge case sometime soon. I recommend to take 2 minutes as a reference point.

JayDDee commented 4 years ago

I think the issue is the scan time. Your seeing the heartbeat every 5 seconds the scantime for getwork. Startum gets a new job about once every minute, the scantime for stratum.

There are currently no checks of the new work to see if it's actually different. I can add a check and that should reduce the heartbeat frequency. Some new block logs will become new work logs and some might disppear.

It's becoming clear another test release will be required to test the next batch of fixes.

I have a fix for invalid block height. It should fix the false positive stale block pre-test.

I'm stuck with the TTF problem, the code is identical to how it's done for stratum. This could be a tricky one.

Unstuck.

I found a difference with the stratum share TTF. It uses targetdiff whis is the stratum diff adjusted by a target factor hard coded fo reach algo. It's adjusted transparently for stratum share TTF but missed on net TTF or getwork block TTF. It's a bit speculative and will definitely need your testing to confirm.

Edit: Usng the target factor didn' work for stratum net TTF, it was ridiculously low. So it wouldn't work for getwork either. I have another possible fix for getwork using the target in the getwork data (target and targetdiff are discussed in #251) instead of the net_diff It shouldn't make a difference theoretically but it might be worth a try..

That's 3 new fixes so far and I haven't looked at your attachments yet.

JayDDee commented 4 years ago

I see the end for this issue. It's main focus was on share stat specific to getwork which includes one new log and verifying data is correct in existing logs.

There are 3 remaining issues:

Repeated new block logs. The next release will have a fix so it works like stratum. Any remaining concerns with verbosity should be tracked by a seperate issue because it also includes stratum, block emission rates, new work/job rates, and scan time. A throttling mechanism may be required for certain low diff coins with very short block times.
The stale block pre-submit test is using invalid block height data obtained from the mininginfo RPC call. A new debug log is introduced to display mininginfo when -D option is used. If the problem persists the test will be disabled as it is of little value.
TTF estimates are incorrect for getwork. For stratum the share TTF is correct but the net TTF has not been verified. TTF is still a work in progress and not specific to getwork. It probably derserves its own issue for tracking.

A new release will be available in a few hours. It will llikley the last test release to deal with getwork specific stats. A final tweak and version bump may come after that to officially close this chapter.

JayDDee commented 4 years ago

FYI. I think I just saw the idle problem with ccminer and powershell. I've only recently started using powershell and never saw the problem before with cmd.exe. I think it's a Windows/powershell issue. It isn't just the CPU that idles, the GPU does as well.

JayDDee commented 4 years ago

v3.12.4.6 is released with a couple of fixes and a coupe of new debug logs. Please test with -D.

The 3 areas of focus:

New blocks. There should be no repeated new block logs. Soime debug info will be displayed to help me understand what data changes when new work arrives for the same block. I'm thinking specifically about ntime, it could be useful to include that in the new block and new work report, maybe for stratum too. It may also lead to a more efficient test where only ntime is checked instead of the entire 80 byte block header.

1b. Also with the new block log is the TTF problem. This may be due to conflicting data between network difficulty and target, described in point 3 below. I've added both network diff and target diff to the new block log for comparison to see which is correct. That info will also be useful for point 3.

Share submission. Not really a fix but more debug info to help understand the stale block false positive. The false positive implies that mininginfo mey not be working correctly. If a solution to the stale block test isn't apparent with the debug info the test will be disabled. If problems with mining info are confirmed it will be tracked by a seperate issue.

Edit: Ignore the following, the results will be invalid due ot a bug in calculating share difficulty.

3, Share result. There seems to be some confusion about the target to solve a block. This is usually determined by the network difficulty but block with lower difficulty are being submitted and accepted. The miner's hash test uses the 256 bit target to do a direct comparison with the 256 bit hash. The share diff will now be calculated using the target diff instead of network diff. A successful test is if the accepted blocks are reported as "BLOCK SOLVED" and the share ratio is > 1. There isn't much else to try if that doesn't work.

Looking forward to your test report, we're getting close to the end.

JayDDee commented 4 years ago

Here's another test suggestion for the new block, new work logs.

The default scan time is 5 seconds. In your last test new block messages were displayed every 7 seconds regularly. It is likley it was displayed every poll, whether or not there was anything new. V3.12.4.6 should address that part and should only log when there is new work. Any poll whith nothing new will not display a log. You can confirm the logging is correct by reducing --scantime to guarantee polling faster than new work. The log frequency should not change to match the scan rate, it should match the actual new work rate.

JayDDee commented 4 years ago

I've made a lot of progress with sharediff & targetdiff and I think I have it all working for the next release.

I've also given up with the stale block test. It's redundant with the stale work test.

Everything should work now. Well almost everything. I'm still not 100% confident with the network difficulty and network hashrate.

With getwork the network hashrate and network difficulty are both provided via RPC mininginfo. The reliability of mininginfo has not been verified.

For Stratum only the network difficulty is provided. The network hash rate is calculated from the diff and the miner counting blocks over time. It should get more accurate as session time increases as long as there are no large changes in network diff. But this hasn't been verified.

I have been able to verify share diff is correct and share targetting is precise so all share related statistics should be accurate.

JayDDee commented 4 years ago

cpuminer-opt-3.12.5 is released. This one should do it, everything should work correctly.

Please report any problems.

In addition to testing for any previously identified problems I have one specific request: Run a test with -D to get minininfo output and post the log. Hopefully it will include solved blocks. It would also be nice, in a strange way, to have a few stale bocks to confirm their legitimacy, although excessive stale blocks is a problem in itself.

You can even double check the math is correct for hashrate and TTF estimates if you want.

There may be a final tweak but if stats are working properly for getwork this issue can finaly be closed.

YetAnotherRussian commented 4 years ago

I'm on it. Net TTF and miner TTF seems to be OK now.

Will use v3.12.5 with -D and output to a file.

UPD:. here it is: 3_12_5_solo_log.txt

Well, I do not see any issues (except no "BLOCK SOLVED" or stale info, and a small race in affinity logging in the very beginning). The lack of proper nethash rounding (should be is not an issue, as it is used only with -D I guess.

This

[2020-03-02 10:14:27][01;37m 8 Submit diff 5.9678, block 6281908[0m [2020-03-02 10:14:28][01;37m 8 [01;37m[32mAccepted 6 [0mS0 [0mR2 [0mB0[0m, 447.622 sec (1081ms)[0m Diff 5.9678 (0.0886), [0mBlock 0[0m [2020-03-02 10:14:33] Mining info: diff 67.353, net_hashrate 39096708.000000, height 6281908[0m

block was not rewarded, I don't know why. Anyway, that should not be a cpuminer-opt issue.

JayDDee commented 4 years ago

The last summary log reported 10 accepted, how many were rewarded?

The -D did its job and confirmed mininginfo was correct.

The first thing I noticed is the net diff and targetdiff are different. That explains why "Accepted" instead of "BLOCK SOLVED". share diff must be >= net diff for BLOCK SOLVED. But I don't understand why they are different and why blocks were rewarded with diff < netdiff

JayDDee commented 4 years ago

I found the problem with zero block number in share_result log.

The bigger question, I'm not sure if it's a problem or a misunderstanding, is the different values of net diff and target diff. It is the cause of other symptoms like Accepted instead of BLOCK SOLVED.

Net_diff is provided via RPC mininginfo and is supposed to be the minimum diff to solve a block

Target diff is calculated from the 256 bit hash target in the work struct and represents the minimum diff for a share

When pool mining they are expected to be different but solo mining there is no share so the target diff should be what is needed to solve a block.

That's the theory. The data show otherwise.

"Shares" were submitted that passed the target test, were accepted and resulted in a block reward. That means the target diff was good enough for a block.

Your session was 136 minutes and you submitted 15 blocks, for a TTF of around 9 mins. The effective hash rate shows you significantly underperformed.

The miner TTF is based on the target diff and estimated around 4m45s. This is consistent with your actual TTF or 9 mins while underperforming.

I can find no inconsistencies in the data, everything indicates target diff is used by the server.

If target diff is what is required to solve a block it should be the same value as net diff, but it isn't.

Why are they different? If target diff is the diff to find a block what is net diff?

I can make it work without answers to those questions by just ignoring net diff and using target diff as I would use net diff. But I don't like doing that, I prefer to understand.

I can do some more math, verifying net hash rate and TTF looking for discrepencies.

I'm curious about the rejects, none were reported as stale. But the reject at 9:51:00 was clearly stale. I'll need to follow up on that too.

I don't see a performmance issue that would suggest an invisible problem. The effective hash rate, the submit rate are consistent with each other and comparing with reference hashrate are consistent with bad luck. More testing would confirm that it is a luck issue and there was no bias. -D is no longer necessary.

If you have any insights or find anything else worth reporting, please do so.

Tha's 3 items so far:

fix share result log block height always zero
Investiigate why net diff & target diff are different when solo mining
Investigate rejected/stale shares

JayDDee commented 4 years ago

Something is not right. The block explorer for Anime says it's at block 4,539,011 but your logs say 6281789. Are you really mining Anime? I need the right block explorer to verify the data.

https://ca.advfn.com/crypto/Animecoin-ANI

JayDDee commented 4 years ago

I tracked the block numbers reported in the logs and everything looks ok. New blocks are around 20 seconds with an occasional new work in between.

What is weird is the polling rate. Getwork requests occur every 7 seconds, same as previous versions but the scan time is 5 seconds. I don't see that as a big problem bug the 2 seconds is precise and unexplained.

Can you reduce the scantime to 4 seconds (--scantime 4) to see if that changes the timing of the logs? I want to confirm the poll time is an offset of the scantime so I know whether to look for an explanation for 2 seconds or the full 7 seconds.

JayDDee commented 4 years ago

A summary of the rejects.

1.

9:46:26 new block 845, targetdiff .2631 9:46:27 submit block 845, diff .80023 9:46:28 rejected block 845 9:46:33 new work (not a new block, still 845) 9:47:01 new block 846

Both the block and diff were good so it wasn't stale or low diff.

2.

9:50:38 new block 853 9:50:57 submit block 853 9:50:59 new block 854 9:51:00 rejected

This was clearly a stale share, a new block was received between submitting the nonce and getting the response. But it wasn't reported as stale.

10:49:40 new block 961, target .28873 10:50:52 submit block 961 diff .93631 10:50:53 rejected 961 10:50:57 new block 962

Not stale, not low diff

4.

10:59:56 new block 971, diff .28873 10:59:57 submit block 971, diff .289 10:59:58 rejected 971 11:00:03 new work (not a new block) 11:00:24 new block 972

This was not stale but could be low diff. The share diff is very close to the target, it could be a math error. The actual test is done on 256 bit integers, the diff is a double precision floating point. int256 is around 70 decimal digits, double has about 55 decimal digits of precision, and the math to convert is truncated to 64 bits or 19 decimal digits. The precision should still be prety good but I recall from school there are many sources of error and some can be magnified by orders of magnitude as they propagate. It's possible the share was higher than the target (lower is better) but when converted to diff (higher is better) the share looks valid.

5.

11:2:06 new block 19 11:22:09 submit block 19 11:22:10 rejected block 19 11:22:13 new work 11:23:37 new block 20

Not stale not low diff.

In general it's looking pretty good. The new blocks and new work are reported corectly and the scan time, other than the mysterious extra 2 seconds, is working well, only 1 real stale block.

But there are some items to follow up:

Stale block reported rejected.

I may be able to find a workaround to determine if the share was stale if I compare the block submitted with the current block. That will shift the window of uncertainty. Instead of stales falsely reported as rejects, rejects might be reported as stale. But it's a much smaller window and lower probability.

The unexplained rejects.

We can't explain them if we don't know the reason. Unfortunately getwork doesn't give a reject reason so it's up to the miner to try to figure it out. Once stale is excluded low diff in the next suspect. If the miner knows it's low diff it will display the raw hash and target for a direct comparison. But there was no reason so no debug data. I can change that to display the hash for any reject. (coding done).

It would be a waste of time to retest the hash using the same test procedure so it has to be manually verified. The hash test may be innacurate which leads to the next item.

Accuracy of hash test.

I've expressed my concerns with the accuracy of the hash test. The test previously had a margin of error built in. The 256 bit target had the lower 192 bits zeroed to make the target lower (more difficult). This would cover up any systemic error in the actual hash test but could result in discarding good shares.

I use the same logic as with stale shares. Don't pre-judge your work, submit it and let the server deal with it. There's potential gain with nothing to lose.

Ideally the test should be precise and accurate, but that might not be possible. i can take another look.

I might be able to create a mechanism like I did fo stales where I set a flag that the share might be stale. If it's untimately rejected I assume it was in fact stale.

For share diff I could set an error tolerance. If the share is within the margin of error I set a flag. If the share is rejected and the marginal diff flag is set I assume it was in fact low diff.

This will require some tuning and a better understanding of the exact error as well as coordination with the stale flag. I wil likely open a new issue for this.

The 2 other items are awaiting more info to explain the block height confusion and to modify the scantime to see if it affects the 7 second getwork poll.

JayDDee commented 4 years ago

Follow up to the accuracy of the hash test.

The test itself is 100% accurate with 100% precision because it does a 256 bit integer comparison between the hash and the target.

This make it unlikely that reject 4 was low difficulty A precise test wilh unconverted data should give a precise result. It may be a coincidence the share was so close to the target. But it's too soon to draw any conclusions yet.

More testing is required to see if there are any other rejects with share diff close to target diff.

I'm abandoning trying to detect marginal shares, it might be trying to ber too clever.

YetAnotherRussian commented 4 years ago

@JayDDee There're 2 block explorers I've mentioned above (https://animeco.in/explorer and https://miningbase.tk/explorer/ANI). This is to check the block height to compare with my logs. I don't know where the one you shared was taken from (it seems to be outdated).

I'm going to provide a new (longer period) session with maxing-out the debug info. This will require to make a ZIP archive btw.

Can you reduce the scantime to 4 seconds (--scantime 4) to see if that changes the timing of the logs?

Will add to CLI.

The last summary log reported 10 accepted, how many were rewarded?

Got rewards for 9 out of 10. When I'll have the logging done, I'll share the reward log as well (so it may be compared to log via timiline). Thanks.

JayDDee commented 4 years ago

Debug is only need for a short test time to measure the poll time with different --scantime. Once you see if the poll time was affected the test is complete and _D isn't needed anymore. The diff and block numbers and ntime are all displayed in normal logs and tell the whole story.

I prefer you do the long test without debug, it just make more logs to search through

YetAnotherRussian commented 4 years ago

So, you have no interest in --protocol-dump to see the received & send info itself? I'll skip this parameter then.

YetAnotherRussian commented 4 years ago

Got three blocks in a row (almost), all got rewarded. So, I'll post it anyway. 3_12_5_solo_log_new.txt

JayDDee commented 4 years ago

Correct, I didn't mention protocol dump earlier because I thought you found it useful. As far as debug is concerned it did it's job in v3.12.5. I got the data I was looking for.

I looked at the explorer and the data looks good. The block height, net diff, net hash rate, block time are all in agreement assuming hash rate and diff rose a little since your test.

You can verify the TTF etc live.

Edit: The poll time in your new test is 6 seconds, reducing the scantime had a direct effect. Now I have to search the code for 2 seconds.

The block explorer confirmed the net diff but it isn't used to set the target. I still don't understand that.

YetAnotherRussian commented 4 years ago

Some info may be found here:

https://github.com/Animecointeam/Animecoin/blob/master/src/rpc/mining.cpp

Block creation method, consensus settings and chain params are split between several header files.

I can also find several reject reasons over there: return "duplicate"; return "duplicate-invalid"; return "duplicate-inconclusive"; return "inconclusive";

Checks and POW part itself: https://github.com/Animecointeam/Animecoin/blob/master/src/pow.cpp

YetAnotherRussian commented 4 years ago

Big log: logs.zip There is also a log with rewards & their timiline inside.

JayDDee commented 4 years ago

First impression of the logs:

Effective share rate is too low. It quickly converges to arond 1800kh/s but the ref rate is 3780 kh/s. Howeer the share rate is precisely correct, the miner TTF is 6 mins and the share rate is .1/min. The net TTF is also correct with a mean of 29.8 secs over the session.

I've never noticed a hashrate problem with stratum mining so it may be a getwork issue. If you have an opportunity could you try anime in a pool to see if it's a coin issue or getwork issue. If you have other wallets lying around you could try a small solo test. There would be no need to wait for a block, just long enough for the effective hash rate to show convergence.

The reject rate isn't alarmingly high, once the stales are factored out it should be even lower. A quick scan of the rejects found none that were suspected of being low diff. Troubleshooting rejects is more difficult because the reason is not provided in the reject message. I don't think I'll be pursuing it any further.

I have fixes ready for block 0 in the share result log and better detection of stale shares but I'm going to pour over the logs looking for anything else that seems off and to follow up on the other identified issues before releasing.

JayDDee commented 4 years ago

I checked the effective hash rate calculation. it's the same code as stratum uses and it reports the correct rate.

Ignore the following, there is a real problem of low difficuty shares. The low effective hash rate must have a different cause.

start ignore

One possibility is the target is too low and shares that would be accepted may be discarded. This is the opposite of low difficulty shares and it is silent with no rejects or other error messages. You can test for this by forcing the targetdiff lower with a cli option.

If you add -m 0.9 it will reduce the target diff by 10%. If you mine with this setting you may submit low diff shares that will be rejected. This will not hurt performance, the miner will just be submitting more shares. It's the rate of low diff rejects and the effective hash rate that will determine if the target was set correctly.

If the target diff is reduced by 10% you should expect 10% rejects for low diff. The effective hash rate will remain the same and the lost hash rate will increase.

If there are no rejects, or less that 10%, it means the lower diff shares are being accepted. You would then see the effective hash rate increase.

This kind of test can be run for a long time, or for as long as it takes to draw a conclusion. It can be done with any multiplier. Over a long enough time you could focus precisely on the exact acceptable target diff and how close it is to 100%.

Don't go higher than -m 1.0, you start to lose performance.

If you do it please report your results.

end ignore

Edit: I have reason to suspect the target diff is actually too low which account for some of your rejects.

R1: diff .35401, target .31496 R2: diff 1.6286, target .34659 R3: diff .38527, target .36654 R4: stale R5: dif .52946, target .40335

A32: diff .42385, target .28476

A32 is the lowest diff share accepted, R1, R3 are the 2 shares with the lowest diff and both were rejected.

R5 has a higher diff than the lowest accepted, but also has a higher target.

This is all evidence of incorrect targetting but targetting too low causing low diff rejects. I recommend continue testing without changing the target diff and note the diff and target of all rejects as well as the lowest accepted shares for a particular target. The ratio should converge to the actual acceptible target.

The ratio is impoprtant because the target changes. From the samples available. 1.31 is rejected, 1.48 is accepted. I have not seen this problem mining other coins with stratum. It's either a getwork issue or a coin issue.

JayDDee / cpuminer-opt

Share stats for solo mining #246