Testworld-2-0-archive-node archive process fail message to big

EmrePiconbello commented 1 year ago

Preliminary Checks

[X] This issue is not a duplicate. Before opening a new issue, please search existing issues: https://github.com/MinaProtocol/mina/issues
[X] This issue is not a question, feature request, RFC, or anything other than a bug report. Please post those things in GitHub Discussions: https://github.com/MinaProtocol/mina/discussions

Description

2023-10-20 23:46:16 UTC [Error] Exception while handling RPC server request from "": $error error: "(src/connection.ml.Handshake_error.Handshake_error((Reading_header_failed(monitor.ml.Error(\"Rpc_transport: message too small or too big\"((Message_size 246454534799363)(Max_message_size 104857600))async_rpc/src/rpc_transport.ml:45:14)(\"Raised at BaseError.raise in file \\"src/error.ml\\" (inlined), line 8, characters 14-30\"\"Called from Core_kernel__Error.failwiths in file \\"src/error.ml\\", line 5, characters 2-50\"\"Called from Async_rpcRpc_transport.Unix_reader.read_forever.loop in file \\"async_rpc/src/rpc_transport.ml\\", line 78, characters 8-52\"\"Called from Async_unixReader0.Internal.read_one_chunk_at_a_time.(fun).loop.(fun) in file \\"src/reader0.ml\\", line 533, characters 27-61\"\"Called from Async_kernelJob_queue.run_job in file \\"src/job_queue.ml\\" (inlined), line 128, characters 2-5\"\"Called from Async_kernel__Job_queue.run_jobs in file \\"src/job_queue.ml\\", line 169, characters 6-47\")))))"

Steps to Reproduce

1.Run archive node 2.Follow the logs 3.You will see the error in every few hours.

...

Expected Result

Archive node archiving without missing a block

Actual Result

Archive missing blocks making the archive incomplete. Requires a recovery

How frequently do you see this issue?

Frequently

What is the impact of this issue on your ability to run a node?

High

Status

Global number of accounts:                     200716
Block height:                                  921
Max observed block height:                     921
Max observed unvalidated block height:         921
Local uptime:                                  2d17h50m34s
Ledger Merkle root:                            jxPEXnnaDKoE9xuGTbQcyUAcHsCjERtCqvAk7wxRafuyRdKt3Ei
Protocol state hash:                           3NKMdU2rkK62KyXdzj3ddHvLG4VRfNXcnpi13NGoit1ve4w9ZT7D
Chain id:                                      332c8cc05ba8de9efc23a011f57015d8c9ec96fac81d5d3f7a06969faf4bce92
Git SHA-1:                                     55b78189c46e1811b8bdb78864cfa95409aeb96a
Configuration directory:                       /root/.mina-config
Peers:                                         158
User_commands sent:                            0
SNARK worker:                                  None
SNARK work fee:                                100000000                                                                                                                                                                                     Sync status:                                   Synced
Catchup status:                                                                                                                                                                                                                                      Finished:  559
                                                                                                                                                                                                                                             Block producers running:                       1 (B62qm35gEuL3F4dLEUpnXTW3tyCbmrBAyY5bmrMggFFdpNLLEbE8khc)
Coinbase receiver:                             Block producer                                                                                                                                                                                Best tip consensus time:                       epoch=0, slot=1314
Best tip global slot (across all hard-forks):  1314                                                                                                                                                                                          Next block will be produced in:                in 2.323d for slot: 2430 slot-since-genesis: 2430 (Generated from consensus at slot: 104 slot-since-genesis: 104)
Consensus time now:                            epoch=0, slot=1315                                                                                                                                                                            Consensus mechanism:                           proof_of_stake
Consensus configuration:                                                                                                                                                                                                                             Delta:                     0
        k:                         290                                                                                                                                                                                                               Slots per epoch:           7140
        Slot duration:             3m                                                                                                                                                                                                                Epoch duration:            14d21h
        Chain start timestamp:     2023-10-17 16:01:01.000000Z                                                                                                                                                                                       Acceptable network delay:  3m
                                                                                                                                                                                                                                             Addresses and ports:
        External IP:                                                                                                                                                                                                                  
        Bind IP:        0.0.0.0                                                                                                                                                                                                    
        Libp2p PeerID:  12D3KooWMPxTu24mCpi3TwmkU4fJk7a8TQ4agFZeTHQRi8KCc3nj
        Libp2p port:    8302
        Client port:    8301

Metrics:
        block_production_delay:             7 (1 0 0 0 0 0 0)
        transaction_pool_diff_received:     2
        transaction_pool_diff_broadcasted:  0
        transactions_added_to_pool:         4554
        transaction_pool_size:              2
        snark_pool_diff_received:           315
        snark_pool_diff_broadcasted:        0
        pending_snark_work:                 0
        snark_pool_size:                    5605

Additional information

No response

psteckler commented 1 year ago

Is this error log from the daemon or from the archive process?

EmrePiconbello commented 1 year ago

From archive process

EmrePiconbello commented 10 months ago

To follow this up because of other weird behavior with postgre we move the postgre to remote isolated instance. Since we move it to remote postgre other issue we reported didn't happen. The remote postgre was causing this error to happen but after some conversation with Gareth we also learn remote postgre cause node to lag behind the chain. Under these circumstances I am very worried about sustainability of archive node.

MinaProtocol / mina