produce invalid share after reconnect

lastmirage commented 5 years ago

when cpuminer disconnected from pool and reconnect immediately,

cpuminer will produce invalid shares until it get a new job.

I tested with x16r algo.

I investigated with packet log, and I found that after reconnect, cpuminer get completely same stratum packet from pool server as before but not same xnonce1 in mining.subscribe's response packet.

JayDDee commented 5 years ago

Have you tried with TPruvot's cpuminer-multi? Do you know why you were disonnecting. I'm not sure what you're saying about your packet analysis. Are you saying cpuminer responded differently to the same job data after the reconnect?

If the job data hasn't changed cpuminer should be using the same data, whether old or new it's the same???

If cpuminer is using stale data (a big if since you say the data was identical) I would expect a stale job reject, not an invalid share. This suggests cpuminer was using neither the previous job data or the new data but something invalid.

The stratum code hasn't changed in a long time and was taken from cpuminer-multi. I find it hard to believe that cpuminer would corrupt the data as a result of a stratum disconnect. If anything it should discard the previous data and look for a new job imediately on reconnect.

This would be an issue with all algos, not just x16 because they all use the same protocol. This makes it very unlikely such a bug would go unnoticed until now.

I'm suspecting a pool problem until I get a clearer picture.

lastmirage commented 5 years ago

I didnt tried with TPruvot's cpuminer-multi. because it is not support x16r. I work with my local pool, and I found this problem with restarting pool. At first I doubt pool, but I think it is not.

after some more investigating, I found below

after reconnect, pool update and resend xnonce1 for miner by mining.subscribe's response packet.
after reconnect, miner report result with xnonce1 which is not updated to lastest xnonce1. miner user xnonce1 from first connect
after reconnect, miner gets new job, miner will report correct result with new xnonce1.

so I think, miner do not use new xnonce1 until new job update.

JayDDee commented 5 years ago

cpuminer-multi does suppport x16r... https://github.com/tpruvot/cpuminer-multi/commit/4073fd4aa2b8f78058fa295806af47cab8b87826 Please test it.

Why is the miner disconnecting? It's important to help understand the problem. Please post the exact error messages displayed by the miner.

I will investigate when I have more information.

lastmirage commented 5 years ago

cpuminer-multi's windows release binary do not support x16r. so I have to compile it. I will try this when I have build environment.

I make the pool to disconnect at any reason when miner report first share for test this problem. this force disconnection is only once at pool running. so I can make this buggy situation.

miner do not show any error message. because miner think it is good.

as I mentioned above, the main problem is this:

after reconnection, pool update miner's xnonce1, miner do not use that updated xnonce1.

below is the stratum packet dump + is first connection, - is reconnection

+[2019-05-28 18:27:16] > {"id": 1, "method": "mining.subscribe", "params": ["cpuminer-opt/3.9.0.1"]} -[2019-05-28 18:27:24] > {"id": 1, "method": "mining.subscribe", "params": ["cpuminer-opt/3.9.0.1", "0100000000000000"]}

+[2019-05-28 18:27:16] < {"id":1,"result":[[["mining.set_difficulty","0.1"],["mining.notify","0100000000000000"]],"0a1f07ba",4]} -[2019-05-28 18:27:24] < {"id":1,"result":[[["mining.set_difficulty","0.1"],["mining.notify","0200000000000000"]],"0a1f07bb",4]}

+[2019-05-28 18:27:16] > {"id": 2, "method": "mining.authorize", "params": ["test.w_01", "x"]} -[2019-05-28 18:27:24] > {"id": 2, "method": "mining.authorize", "params": ["test.w_01", "x"]}

+[2019-05-28 18:27:16] > {"id": 3, "method": "mining.extranonce.subscribe", "params": []} -[2019-05-28 18:27:24] > {"id": 3, "method": "mining.extranonce.subscribe", "params": []}

+[2019-05-28 18:27:16] < {"id":null,"method":"mining.set_difficulty","params":[0.1]} -[2019-05-28 18:27:24] < {"id":null,"method":"mining.set_difficulty","params":[0.1]}

+[2019-05-28 18:27:16] < {"id":null,"method":"mining.notify","params":["1","ddca3d6fecfbe3f421d894a3ec0791937883bfd211fff63a428fefa0000004f6","01000000010000000000000000000000000000000000000000000000000000000000000000ffffffff260375670404f3feec5c08","135b68747470733a2f2f706f6f6c73616c6f6e5d00000000020000000000000000266a24aa21a9ede2f61c3f71d1defd3fa999dfa36953755c690689799962b48bebd836974e8cf90088526a740000001976a9140917a374e4805876aa036db513ccf4ce08789ebc88ac00000000",[],"40000000","1e0c5ed0","5cecff06",true]} -[2019-05-28 18:27:24] < {"id":null,"method":"mining.notify","params":["1","ddca3d6fecfbe3f421d894a3ec0791937883bfd211fff63a428fefa0000004f6","01000000010000000000000000000000000000000000000000000000000000000000000000ffffffff260375670404f3feec5c08","135b68747470733a2f2f706f6f6c73616c6f6e5d00000000020000000000000000266a24aa21a9ede2f61c3f71d1defd3fa999dfa36953755c690689799962b48bebd836974e8cf90088526a740000001976a9140917a374e4805876aa036db513ccf4ce08789ebc88ac00000000",[],"40000000","1e0c5ed0","5cecff06",true]}

+[2019-05-28 18:27:24] > {"method": "mining.submit", "params": ["test.w_01", "1", "08000000", "5cecff06", "480fae2a"], "id":4} -[2019-05-28 18:27:28] > {"method": "mining.submit", "params": ["test.w_01", "1", "0d000000", "5cecff06", "5c09acea"], "id":4}

+Hash: 000006b4911c7ddfcd7258bfc8ae0c6e9f66cc9836653d15e24b58cc7ede067d -Hash: 0000086dae5f0225b62d90f6bb6448d2b4f6027de3600bd71d89a92d7d57cbf0

as you can see, pool update xnonce1 after reconnect. from "0a1f07ba" to "0a1f07bb"

but miner still use "0a1f07ba" for hashing and make result with it. pool has new xnonce1, so reported result makred to invalid share.

JayDDee commented 5 years ago

I think you misunderstand. The pool doesn't send the nonce, the miner does. The miner loops hashing nonces in sequence (as you see above the nonces are consecutive) until it finds a solution. Then it sends the nonce to the pool with the solution and the pool verifies the solution is valid for that nonce and replies to the miner.

The reject is the reply from the pool reporting that the solution was not valid for the nonce the miner sent. The miner sends the the same nonce it used to calculate the solution, there is no way for them to mismatch. It has nothing to do with the disconnection, the hashing thread knows nothing about the connection.

I think you see a problem in the wrong place. The nonce increment just means the "-" nonce is from the next iteration of the hash loop, as it should be.

When the miner experiences connection problems it displays various messages including whether it disconnects. Where are those messages? I've asked for them.

I don't like having to ask the same questions repeatedly. You also fail to consider the pool side of the problem. It's sending a reject notice to the miner. Find out why.

Until you can convince me that you have ruled out a pool issue I will not investigate further.

lastmirage commented 5 years ago

Pool send xnonce1 and miner report xnonce2 and nonce via mining.submit packet.

xnonce1 and xnonce2 used to make coinbase, nonce used to make block header.

miner use wrong xnonce1, so pool reject that.

[2019-05-28 18:27:28] > {"method": "mining.submit", "params": ["test.w_01", "1", "0d000000", "5cecff06", "5c09acea"], "id":4} [2019-05-28 18:27:28] < {"id":4,"result":false} [2019-05-28 18:27:28] Rejected 1/2 (50.0%), diff 0.000463, 391.78 kH/s [2019-05-28 18:27:28] stratum_recv_line failed

above is log of disconnection. I use testnet so diff is very low. as I mentioned, I update my local pool source to disconnect miner when it report first share.

the problem is that miner do not invalidate xnonce1 when it reconnected to pool.

you can know it if you calculate hashes by hand with dumped data. all data is same but xnonce1, xnonce2 and nonce are not same.

if you use (-xnonce1, -xnonce2, -nonce) to generate hash, it makes wrong hash. but if you use (+nonce1, -xnonce2, -nonce), you get correct hash.

JayDDee commented 5 years ago

It's a race condition. This can happen but your pool is not handling it well. It should not report invalid shares, it should detect them as stale. It is normal to submit a couple of stale shares after a stratum connection problem.

I could have figured this out a lot sooner with a proper problem description instead of posting a packet dump. Things you should have reported in the first post are the miner output (DUH!) And that you were the pool admin and were testing disconnects. And you still haven't tested with a different miner.

Go away.

JayDDee commented 5 years ago

The bottom line is cpuminer is not fault tolerant software and you are doing fault insertion testing. This not valid testing for the product. On that basis alone this issue can be closed. However there is also a very good technical reason not to try to handle this situation. cpuminer is intended to be as fast as possible. If it has spent the effort to calculate a valid hash it will be submitted regardless if it may be stale. The only way cpuminer will not submit a share is if the hash fails verification or the submission attempt fails due to network disruption. The only down side is the optics possibly seeing a reject. I would hate to have a false positive by neglecting to submit a share that would have been accepted.

It's closed.

JayDDee / cpuminer-opt

produce invalid share after reconnect #180