Open attila-lendvai opened 1 week ago
sometimes i get an error when trying to resend the tx:
{
"code": 500,
"message": "can't resend transaction"
}
the log says:
"time"="2024-09-15 20:53:46.847235" "level"="debug" "logger"="node/api/post_transaction" "msg"="resend transaction failed" "tx_hash"="[...]" "error"="transaction hash changed"
not sure if that helps.
might be connected to https://github.com/ethersphere/bee/issues/4740
$ curl -s -X GET localhost:1701/transactions | jq
{
"pendingTransactions": [
{
"transactionHash": "0x4d0aa925071fcd87f2d14d277f37bff7f429894c51bfb8f458f96ff14ca129f8",
"to": "0xdbf3ea6f5bee45c02255b2c26a16f300502f68da",
"nonce": 73,
"gasPrice": "4000000005",
"gasLimit": 65000,
"gasTipBoost": 0,
"gasTipCap": "1999999999",
"gasFeeCap": "4000000005",
"data": "0x095ea7b3000000000000000000000000781c6d1f0eae6f1da1f604c6cdccdb8b76428ba700000000000000000000000000000000000000000000000012751bf40f450000",
"created": "2024-09-16T17:03:16+02:00",
"description": "Approve tokens for stake deposit operations",
"value": "0"
},
{
"transactionHash": "0x883c9dfe6e94ac4febaf409b4e2ffa45283b92eaac1890d0750a1f09d0cfabb9",
"to": "0x0124b7263475e8d9e3a913a79c0d72de1f5885e4",
"nonce": 73,
"gasPrice": "6200000003",
"gasLimit": 21000,
"gasTipBoost": 0,
"gasTipCap": "2199999998",
"gasFeeCap": "6200000003",
"data": "0x",
"created": "2024-09-17T00:17:36+02:00",
"description": "Approve tokens for stake deposit operations (cancellation)",
"value": "0"
},
{
"transactionHash": "0xa3f46e0b83374e9448e26dcbe7165aef25768d369667a05109ecb254d835f3c1",
"to": "0xdbf3ea6f5bee45c02255b2c26a16f300502f68da",
"nonce": 74,
"gasPrice": "2258160007",
"gasLimit": 65000,
"gasTipBoost": 0,
"gasTipCap": "1129080000",
"gasFeeCap": "2258160007",
"data": "0x095ea7b3000000000000000000000000781c6d1f0eae6f1da1f604c6cdccdb8b76428ba700000000000000000000000000000000000000000000000012751bf40f450000",
"created": "2024-09-17T00:15:45+02:00",
"description": "Approve tokens for stake deposit operations",
"value": "0"
}
]
}
i shut down the node, and this is what i see in the log on the way out; i.e. the unstake operation remains pending all the way until i initiate the shutdown. then i see the errors below:
"time"="2024-09-18 13:42:59.523355" "level"="info" "logger"="node" "msg"="received interrupt signal"
"time"="2024-09-18 13:42:59.523598" "level"="info" "logger"="node" "msg"="shutting down..."
"time"="2024-09-18 13:42:59.524330" "level"="debug" "logger"="node/api/delete_withdraw_all_stake" "msg"="withdraw stake failed" "error"="context canceled: Post \"https://rpc.gnosis.gateway.fm/\": context canceled"
"time"="2024-09-18 13:42:59.524396" "level"="error" "logger"="node/api/delete_withdraw_all_stake" "msg"="withdraw stake failed"
"time"="2024-09-18 13:42:59.524541" "level"="info" "logger"="node/api" "msg"="debug api access" "ip"="127.0.0.1" "method"="DELETE" "host"="localhost:1701" "uri"="/stake" "proto"="HTTP/1.1" "status"=500 "size"=48 "duration"="37h27m13.721793011s" "user-agent"="curl/7.85.0"
"time"="2024-09-18 13:42:59.525049" "level"="debug" "logger"="node/listener" "msg"="shutting down event listener"
"time"="2024-09-18 13:42:59.526187" "level"="info" "logger"="node/pusher" "msg"="pusher shutting down"
"time"="2024-09-18 13:42:59.634874" "level"="error" "logger"="node/transaction" "msg"="waiting for pending transaction failed" "error"="monitor closed" "sender_address"="0x0124b7263475E8d9e3A913A79c0D72de1F5885e4" "tx"="0x883c9dfe6e94ac4febaf409b4e2ffa45283b92eaac1890d0750a1f09d0cfabb9"
"time"="2024-09-18 13:42:59.634895" "level"="error" "logger"="node/transaction" "msg"="waiting for pending transaction failed" "error"="monitor closed" "sender_address"="0x0124b7263475E8d9e3A913A79c0D72de1F5885e4" "tx"="0xa3f46e0b83374e9448e26dcbe7165aef25768d369667a05109ecb254d835f3c1"
"time"="2024-09-18 13:42:59.634938" "level"="error" "logger"="node/transaction" "msg"="waiting for pending transaction failed" "error"="monitor closed" "sender_address"="0x0124b7263475E8d9e3A913A79c0D72de1F5885e4" "tx"="0x4d0aa925071fcd87f2d14d277f37bff7f429894c51bfb8f458f96ff14ca129f8"
"time"="2024-09-18 13:42:59.933857" "level"="info" "logger"="node" "msg"="node shutdown"
after the restart, only the nonce 74 transaction is left:
$ curl -s -X GET localhost:1701/transactions | jq
{
"pendingTransactions": [
{
"transactionHash": "0xa3f46e0b83374e9448e26dcbe7165aef25768d369667a05109ecb254d835f3c1",
"to": "0xdbf3ea6f5bee45c02255b2c26a16f300502f68da",
"nonce": 74,
"gasPrice": "2258160007",
"gasLimit": 65000,
"gasTipBoost": 0,
"gasTipCap": "1129080000",
"gasFeeCap": "2258160007",
"data": "0x095ea7b3000000000000000000000000781c6d1f0eae6f1da1f604c6cdccdb8b76428ba700000000000000000000000000000000000000000000000012751bf40f450000",
"created": "2024-09-17T00:15:45+02:00",
"description": "Approve tokens for stake deposit operations",
"value": "0"
}
]
}
strangely enough the nonce 73 tx'es are not in the blockchain as per https://gnosisscan.io/ but they are gone from the node's local state.
reading the staked value works again at this point.
my node's eth address has no transactions for the last 175 days. the nonce of the last tx is 59 in the explorer.
how come that bee is creating tx'es with nonce as high as 74?
i initiated unstake once again, it created a tx with nonce 75 (which is pending forever as expected):
{
"pendingTransactions": [
{
"transactionHash": "0x7946d7d75b110a25c3325acd319ab289a17c1171eb5ce4d94ff1fb04f292714c",
"to": "0xdbf3ea6f5bee45c02255b2c26a16f300502f68da",
"nonce": 75,
"gasPrice": "3300000001",
"gasLimit": 65000,
"gasTipBoost": 0,
"gasTipCap": "1649999997",
"gasFeeCap": "3300000001",
"data": "0x095ea7b3000000000000000000000000781c6d1f0eae6f1da1f604c6cdccdb8b76428ba700000000000000000000000000000000000000000000000012751bf40f450000",
"created": "2024-09-18T13:50:03+02:00",
"description": "Approve tokens for stake deposit operations",
"value": "0"
},
{
"transactionHash": "0xa3f46e0b83374e9448e26dcbe7165aef25768d369667a05109ecb254d835f3c1",
"to": "0xdbf3ea6f5bee45c02255b2c26a16f300502f68da",
"nonce": 74,
"gasPrice": "2258160007",
"gasLimit": 65000,
"gasTipBoost": 0,
"gasTipCap": "1129080000",
"gasFeeCap": "2258160007",
"data": "0x095ea7b3000000000000000000000000781c6d1f0eae6f1da1f604c6cdccdb8b76428ba700000000000000000000000000000000000000000000000012751bf40f450000",
"created": "2024-09-17T00:15:45+02:00",
"description": "Approve tokens for stake deposit operations",
"value": "0"
}
]
}
ok, i've managed to "fix" this by disabling the nonce cache (?) of the node, i.e. always using nonce value returned by the blockchain backend:
func (t *transactionService) nextNonce(ctx context.Context) (uint64, error) {
onchainNonce, err := t.backend.PendingNonceAt(ctx, t.sender)
t.logger.Debug("nextNonce", "t.backend.PendingNonceAt returned onchainNonce", onchainNonce)
if err != nil {
return 0, err
}
// var nonce uint64
// err = t.store.Get(t.nonceKey(), &nonce)
// if err != nil {
// // If no nonce was found locally used whatever we get from the backend.
// if errors.Is(err, storage.ErrNotFound) {
// return onchainNonce, nil
// }
// return 0, err
// }
// // If the nonce onchain is larger than what we have there were external
// // transactions and we need to update our nonce.
// if onchainNonce > nonce {
// return onchainNonce, nil
// }
return onchainNonce, nil
}
note that i'm doing all this without any deep understanding of what's really going on here... but either way, with this patch my node managed to unstake within seconds.
also note that i suspect my nodes have pretty much stopped earning rewards months ago (i don't have monitoring set up), and i suspect that it was also due to this.
for now i still have a few nodes that have this same issue. i'd like to update them, too, so, please let me know (the sooner the better) if i should patch in some extra logging and send them to you to debug this.
i've updated one of the problematic nodes to v2.2.0, and to my surprise it managed to stake without a hitch.
this means that either my hypothesis is wrong about the out-of-sync local nonce state, or this has been addressed/fixed since v2.1.0.
i started a problematic node, initiated the unstake (it hangs).
the pprof output after that:
i have a theory why this happens:
if a tx is created that never gets incorporated (e.g. due to insufficient fee, or the node getting shut down at the wrong moment for a longer period), then the tx eventually times out (?). and in that case the local nonce state is not corrected (i have not verified this by looking at the codebase).
Summary
i initiate unstaking using
curl -s -X DELETE localhost:1708/stake | jq
. after 20+ minutes the curl call is still pending.i cannot find this txHash in https://gnosisscan.io/ i'm not sure whether that is to be expected or not.
republishing doesn't change much:
i tried using 3 different gnosis endpoints with the same results:
Expected behavior
unstaking finishes in less than a minute.
Actual behavior
the unstaking tx is not getting incorporated into the blockchain, because the nonce of the generated tx is not in sync with the blockchain state.
the logs didn't help much, not even on debug level. all i see is this:
Steps to reproduce
i didn't do anything special, just followed the release notes to unstake before updating to the 2.2.0 release.
for now i still have a few nodes in a state like this.