ethereum / go-ethereum

Go implementation of the Ethereum protocol
https://geth.ethereum.org
GNU Lesser General Public License v3.0
47.44k stars 20.08k forks source link

For some blocks, Batch RPC return only some replies, or the replies have null/mismatched fields #23132

Open wizzard0 opened 3 years ago

wizzard0 commented 3 years ago

System information

Geth version: 1.10.1, 1.10.4 OS & Version: Win10, Ubuntu 18, Ubuntu 20

Expected behaviour

For the batch call over websocket, replies for all requests, for all blocks are consistent.

Actual behaviour

Steps to reproduce the behaviour

  1. Subscribe to newHeads
  2. For each block header, submit batch request (ethclient.BatchCallContext) with batch of eth_getTransactionReceipt or debug_traceTransaction for every transaction.
  3. Optional: For Golang RPC client, add logging of raw resp.Result in rpc/client.go right before json.Unmarshal (https://github.com/ethereum/go-ethereum/blob/master/rpc/client.go#L391)
  4. Wait about 100-300 blocks
  5. Observe NULLs/garbage in resp.result bytes instead of expected replies, and/or JSON unmarshal error

    Backtrace

    N/A

When submitting logs: please submit them as text and not screenshots.

ligi commented 3 years ago

Which network are you on?

wizzard0 commented 3 years ago

mainnet

On 1 Jul 2021, at 11:27, ligi @.***> wrote:

Which network are you on?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

karalabe commented 3 years ago

Our hunch is that it might be mini 1 block "reorgs". E.g. when a new block becomes canonical in place of an equal height old block. It might be a data race in that case: header gets announced, you request receipts, but by that time another header gets swapped in, so some of the transactions might be "undone".

Could you check if those txs that do get sent back, whether the inclusion header hash matches the one you requested? Could you extend your tester so that if you see a null, you double check if a new header became canonical?


If you have a snippet you could give us to repro exactly, that would help understand it better.

wizzard0 commented 3 years ago

Yeah the header hash matches, reorgs were the first thing I suspected.

But I receive header 0xaaa, then notification for header 0xbbb, request receipts for 0xbbb, get like half of them (randomly), all referring to block 0xbbb, then receive notification for header 0xccc which references 0xbbb as parent, and nothing else.

If those were reorgs, then I assume I should have received more header notifications?

With tester, do you mean the client code, or try to patch to the node itself?

On 1 Jul 2021, at 11:31, Péter Szilágyi @.***> wrote:

Our hunch is that it might be mini 1 block "reorgs". E.g. when a new block becomes canonical in place of an equal height old block. It might be a data race in that case: header gets announced, you request receipts, but by that time another header gets swapped in, so some of the transactions might be "undone".

Could you check if those txs that do get sent back, whether the inclusion header hash matches the one you requested? Could you extend your tester so that if you see a null, you double check if a new header became canonical?


If you have a snippet you could give us to repro exactly, that would help understand it better.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

pinkiebell commented 3 years ago

I observe similar behaviour with a light-mode enabled node. Block hashes and transaction list etc. are all fine but if I invoke eth_getTransactionByHash... I sometimes receive the tx object with input being null. But this is not easy to reproduce. Trying the call again may return the '0x...' data of the transaction - OR - the field stays 'null' unless the node is restarted.

Must have something todo with error handing/caching in the LES layer and maybe bad peers.

edit Btw, I didn't use batch calls in my example.