JayDDee / cpuminer-opt

Optimized multi algo CPU miner
Other
774 stars 545 forks source link

Stats get corrupted when a submitted share is unacknowledged. #216

Closed JayDDee closed 4 years ago

JayDDee commented 4 years ago

The most obvious symptom is the latency values rise significantly.

The only corrective action at this time is to restart the miner.

The root of the problem is there is no way to associate the reponse from the pool to a specific submitted share. This makes it impossible to detect an unacknowledged share. Once that occurs the share timings are garbage as well as anything related to those timings.

Resetting the stats can fix the problem after it has occurred. This is ok if there are no pending shares but if the stats are reset while a share submission is pending acceptance it could cause the problem it's trying to correct.

There's no obvious solution at this time.

JayDDee commented 4 years ago

I found a bug that could cause stats to get out of synch but the root problem still exists. There is no reliable way to detect a submitted share with no reply and this will still cause stats corruption.

I will add some checks for mismatches between submits and replies and reprt discrepencies in the 5 minute summary. No automatic corrective action can be taken because the test could have been done in the window between submitting the share and receiving the reply. Taking action in this case would cause the problem when it didn't really exist.

The only foolproff corrective action is to restart the miner manually.

This is just a stats issue it has no effect on performance so letting the miner run with messed up stats causes no other problems.

JayDDee commented 4 years ago

The root problem can be mitigated by comparing the number of shares submitted with replies received (acc + rej) and reporting mismatches. It is fairly certain that stats will be corrupt when a share is submitted with no reply.

Stratum errors can also cause share mismatches but much less likely. It depends on whether there was a pending share when the error occurred and whether it resulted in lost data. Stratum errors are currently reported in real time.

No replies and stratum errors will be added to the periodic summary report.

JayDDee commented 4 years ago

v3.9.9.1. Stratum errors, no replies, and solved blocks are not added to summary report. Due to their very low incidence in a stable environment, and adequate reporting of incidents I feel it unnessary duplication.