Open jinseon2416 opened 1 month ago
May I ask you @jinseon2416 is it was bootstrapped from snapshot?
Hmm... this is unusual. We haven't seen any other reports of this.
Can you please: (i) stop the fullnode; (ii) restart the fullnode and collect the logs (e.g., write them to a file); (iii) let the node run for a few minutes (e.g., 5 minutes?); and (iv) upload the entire log file of the node. This will hopefully show us the error the node is running into.
Note: it's necessary to have the entire log file from reboot so that we can help debug π
Hi there. Here to report another instance of this. I have been in contact with @p1xel32 in discord. I tried Fast sync, then went on to try bootstrapping. Contacted Polkachu team and they can't update the snapshots for bootstrapping neither. BwareLabs snapshot is also lagging behind schedule. I've attached the full log since initial pulling of the docker image. log.txt
@m229abd We have identified several users experiencing similar issues. For those who re-synced using the default configuration, the problem was resolved. If you're encountering character and connectivity issues, especially after bootstrapping your nodes, re-syncing with the default configuration might be beneficial.
Default config you could find below: https://github.com/aptos-labs/aptos-core/blob/main/config/src/config/test_data/public_full_node.yaml
@JoshLind I believe the original issue is here: https://github.com/aptos-labs/aptos-core/issues/13553
We performed bootstrapping using aptos foundation backup storage snapshots. https://aptos.dev/nodes/full-node/aptos-db-restore
What we need now is an archive mainnet. When running with the default configuration, the entire transaction cannot be saved and executed, so the state sync setting is used as follows. https://aptos.dev/guides/state-sync
Are there any other alternatives?
state_sync:
state_sync_driver:
bootstrapping_mode: ExecuteTransactionsFromGenesis
continuous_syncing_mode: ExecuteTransactions
We share logs that occur when data synchronization stops. It seems to occur when DataStreamNotificationTimeout":"5s" is exceeded. Are there any recommendations?
2024-06-21T02:51:09.284649Z [sync-driver-1] WARN state-sync/state-sync-driver/src/continuous_syncer.rs:195 Resetting the currently active data stream due to too many timeouts!
2024-06-21T02:51:09.284684Z [sync-driver-1] INFO state-sync/state-sync-driver/src/utils.rs:247 {"message":"Terminating the data stream with ID: 1, notification and feedback: None","name":"driver"}
2024-06-21T02:51:09.284718Z [sync-driver-1] WARN state-sync/state-sync-driver/src/driver.rs:631 {"error":{"CriticalDataStreamTimeout":"12"},"message":"Error found when driving progress of the continuous syncer!","name":"driver"}
2024-06-21T02:51:09.284782Z [stream-serv-9] INFO state-sync/data-streaming-service/src/streaming_service.rs:222 {"event":"success","message":"Terminating the data stream with ID: 1. Notification and feedback: None","name":"handle_terminate_request","stream_id":1}
2024-06-21T02:51:09.291146Z [stream-serv-9] INFO state-sync/data-streaming-service/src/streaming_service.rs:297 {"event":"success","message":"Stream created for request: StreamRequestMessage { stream_request: ContinuouslyStreamTransactionsOrOutputs(ContinuouslyStreamTransactionsOrOutputsRequest { known_version: 992582227, known_epoch: 7451, include_events: false, target: None }), response_sender: Sender { complete: false } }","name":"handle_stream_request","stream_id":2}
2024-06-21T02:51:09.336301Z [stream-serv-1] INFO state-sync/data-streaming-service/src/streaming_service.rs:371 {"event":"success","message":"Data stream initialized.","name":"initialize_stream","stream_id":2}
2024-06-21T02:51:14.036718Z [data-client-9] INFO state-sync/aptos-data-client/src/client.rs:1190 {"event":"priority_and_regular_peers","message":"Number of priority peers: 6. Number of regular peers: 0","name":"peer_states"}
2024-06-21T02:51:14.036834Z [data-client-9] INFO state-sync/aptos-data-client/src/client.rs:1223 {"event":"priority_peer_categories","message":"Number of connected peers by priority: {HighPriority: 6}","name":"peer_states"}
2024-06-21T02:51:14.137172Z [data-client-9] INFO state-sync/aptos-data-client/src/poller.rs:444 {"event":"aggregate_summary","message":"Global data summary: AdvertisedData { Highest epoch ending ledger info, epoch: Some(7450). Highest synced ledger info, epoch: Some(7451), version: Some(992584475). Lowest transaction version: Some(842581461), Lowest transaction output version: Some(842581461), Lowest states version: Some(912584460) }, OptimalChunkSizes { epoch_chunk_size: 200, state_chunk_size: 4000, transaction_chunk_size: 2000, transaction_output_chunk_size: 1000 }","name":"peer_states"}
2024-06-21T02:51:14.292451Z [sync-driver-1] WARN state-sync/state-sync-driver/src/driver.rs:631 {"error":{"DataStreamNotificationTimeout":"5s"},"message":"Error found when driving progress of the continuous syncer!","name":"driver"}
2024-06-21T02:51:19.034046Z [data-client-14] INFO state-sync/aptos-data-client/src/client.rs:1190 {"event":"priority_and_regular_peers","message":"Number of priority peers: 6. Number of regular peers: 0","name":"peer_states"}
2024-06-21T02:51:19.034076Z [data-client-14] INFO state-sync/aptos-data-client/src/client.rs:1223 {"event":"priority_peer_categories","message":"Number of connected peers by priority: {HighPriority: 6}","name":"peer_states"}
2024-06-21T02:51:19.293671Z [sync-driver-1] WARN state-sync/state-sync-driver/src/driver.rs:631 {"error":{"DataStreamNotificationTimeout":"5s"},"message":"Error found when driving progress of the continuous syncer!","name":"driver"}
The same synchronization interruption is occurring in the latest version v1.15.2.
2024-06-28T04:31:30.602374Z [sync-driver-14] WARN state-sync/state-sync-driver/src/bootstrapper.rs:593 Resetting the currently active data stream due to too many timeouts!
2024-06-28T04:31:30.602398Z [sync-driver-14] INFO state-sync/state-sync-driver/src/utils.rs:247 {"message":"Terminating the data stream with ID: 24, notification and feedback: None","name":"driver"}
2024-06-28T04:31:30.602440Z [sync-driver-14] WARN state-sync/state-sync-driver/src/driver.rs:657 {"error":{"CriticalDataStreamTimeout":"12"},"message":"Error found when checking the bootstrapper progress!","name":"driver"}
2024-06-28T04:34:30.649098Z [stream-serv-5] INFO state-sync/data-streaming-service/src/streaming_service.rs:297 {"event":"success","message":"Stream created for request: StreamRequestMessage { stream_request: GetAllTransactionsOrOutputs(GetAllTransactionsOrOutputsRequest { start_version: 1001178912, end_version: 1001288966, proof_version: 1003283292, include_events: false }), response_sender: Sender { complete: false } }","name":"handle_stream_request","stream_id":28}
2024-06-28T04:34:30.655387Z [stream-serv-0] INFO state-sync/data-streaming-service/src/streaming_service.rs:371 {"event":"success","message":"Data stream initialized.","name":"initialize_stream","stream_id":28}
2024-06-28T04:34:32.103636Z [network-Pub-0] INFO network/framework/src/peer_manager/mod.rs:223 Current connected peers {"network_context":{"role":"full_node","network_id":"Public","peer_id":"6f7a88ac44b154da932976a5358dddb478a19066b879f8ca322d45d7ddadc46c"},"peers":"[(371dfdee9fbecaa358683936c5946aa28393067fa60e1e185c4250da461e15cf, /dns/vfn.aptos.dsrvlabs.net/tcp/6182/noise-ik/0xdf0eb85221ed6f116dd5caa1e7461d2ec2e3c6be4b252ac90246b3384e46d96d/handshake/0, outbound), (c41e4eded759ab4a31e999bbef6d14361fd94bea59db8310703b1668af98a23d, /dns/aptos-vfn-en.marblex.io/tcp/6182/noise-ik/0xdac7e141cc34fe3f06f40f54e3b932eeacd5b81631e18c0775a84f03959d953c/handshake/0, outbound), (2fa9d01fb4a1c2fdac5f45419b98213604bc215817d4b394ab372e5d59d2753c, /dns/aptos-staked.prod-eks-ap-northeast-2.staked.cloud/tcp/6182/noise-ik/0xbcfd67cf425a8633620fd848c9446a69d34c601d2d73bb864c86584f53012f66/handshake/0, outbound), (0a6eb54c274da96778a6702740a763f6aa7b02eef3d1efc2623d28289d185118, /ip4/47.91.19.61/tcp/6182/noise-ik/0x7d9348a4e93a17dbb73b511807710415ebe2a6c2c36ab404e2d63174407e625b/handshake/0, outbound), (568fdb6acf26aae2a84419108ff13baa3ebf133844ef18e23a9f47b5af16b698, /dns/node1.cloud-b.mainnet.aptoslabs.com/tcp/6182/noise-ik/0x003cc2ed36e7d486539ac2c411b48d962f1ef17d884c3a7109cad43f16bd5008/handshake/0, outbound), (947c6451ad9a8659cba4e60a6da41086eb66e73fc07936bd3b7044873b5e094d, /dns/vfn.mainnet.aptos.despread.io/tcp/6182/noise-ik/0xd4eabfa79a26a59f5701397277556c4f1343e4892bfecf13ef9401c1650f317a/handshake/0, outbound)]"}
2024-06-28T04:34:33.107022Z [data-client-12] INFO state-sync/aptos-data-client/src/poller.rs:444 {"event":"aggregate_summary","message":"Global data summary: AdvertisedData { Highest epoch ending ledger info, epoch: Some(7535). Highest synced ledger info, epoch: Some(7536), version: Some(1003302786). Lowest transaction version: Some(853298253), Lowest transaction output version: Some(853298253), Lowest states version: Some(923302743) }, OptimalChunkSizes { epoch_chunk_size: 200, state_chunk_size: 4000, transaction_chunk_size: 2000, transaction_output_chunk_size: 1000 }","name":"peer_states"}
2024-06-28T04:34:34.003254Z [data-client-12] INFO state-sync/aptos-data-client/src/latency_monitor.rs:239 {"event":"waiting_for_catchup","message":"Waiting for the node to catch up to the latest version before starting the latency monitor.","name":"latency_monitor"}
2024-06-28T04:34:35.003223Z [data-client-12] INFO state-sync/aptos-data-client/src/client.rs:1190 {"event":"priority_and_regular_peers","message":"Number of priority peers: 6. Number of regular peers: 0","name":"peer_states"}
2024-06-28T04:34:35.003247Z [data-client-12] INFO state-sync/aptos-data-client/src/client.rs:1223 {"event":"priority_peer_categories","message":"Number of connected peers by priority: {HighPriority: 6}","name":"peer_states"}
2024-06-28T04:34:35.650437Z [sync-driver-0] WARN state-sync/state-sync-driver/src/driver.rs:657 {"error":{"DataStreamNotificationTimeout":"5s"},"message":"Error found when checking the bootstrapper progress!","name":"driver"}
2024-06-28T04:34:40.003748Z [data-client-12] INFO state-sync/aptos-data-client/src/client.rs:1190 {"event":"priority_and_regular_peers","message":"Number of priority peers: 6. Number of regular peers: 0","name":"peer_states"}
2024-06-28T04:34:40.003774Z [data-client-12] INFO state-sync/aptos-data-client/src/client.rs:1223 {"event":"priority_peer_categories","message":"Number of connected peers by priority: {HighPriority: 6}","name":"peer_states"}
2024-06-28T04:34:40.651480Z [sync-driver-0] WARN state-sync/state-sync-driver/src/driver.rs:657 {"error":{"DataStreamNotificationTimeout":"5s"},"message":"Error found when checking the bootstrapper progress!","name":"driver"}`
@jinseon2416
Also, If you simply synchronize without querying the rpc API, the same symptom does not appear
?@p1xel32
@m229abd,
Looking at the logs, I see these errors:
Errors: [InvalidResponse(\"Unexpected error encountered: invalid value: integer `5`, expected variant index 0 <= i < 4\"), InvalidResponse(\"Unexpected error encountered: invalid value: integer `5`, expected variant index 0 <= i < 4\")],
This tells me that your node is running an incompatible version with the peers on the rest of the network. When it makes calls to receive new data (after fast syncing), it is unable to parse the responses because of the message version incompatibility. Can you please update to the last mainnet version and retry? e.g., currently v1.15.2
I suspect this is completely unrelated to the issue faced by @jinseon2416.
Sorry for keeping you uninformed, but we did update shortly after v1.15.2 came out, and it did fix the synchronization. Thank you for following up on the issue.
On Tue, Jul 2, 2024, 16:09 Josh Lind @.***> wrote:
@m229abd https://github.com/m229abd,
- Looking at the logs, I see these errors:
Errors: [InvalidResponse(\"Unexpected error encountered: invalid value: integer
5
, expected variant index 0 <= i < 4\"), InvalidResponse(\"Unexpected error encountered: invalid value: integer5
, expected variant index 0 <= i < 4\")],This tells me that your node is running an incompatible version with the rest of the network. When it makes calls to receive new data (after fast syncing), it is unable to parse the responses because of the message version incompatibility. Can you please update to the last mainnet version? e.g., currently v1.15.2 https://github.com/aptos-labs/aptos-core/releases/tag/aptos-node-v1.15.2
β Reply to this email directly, view it on GitHub https://github.com/aptos-labs/aptos-core/issues/13609#issuecomment-2203057194, or unsubscribe https://github.com/notifications/unsubscribe-auth/AS3B2QABN5L6GT35SEBYCTLZKKNQJAVCNFSM6AAAAABI7G4HEKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBTGA2TOMJZGQ . You are receiving this because you were mentioned.Message ID: @.***>
π Bug
We are continuously collecting block information through the rpc API while synchronizing the archive fullnode. After recently upgrading to the latest aptos version v1.13.1, synchronization stops frequently. I downgraded to 1.12.1 and the same symptom does not occur. Also, If you simply synchronize without querying the rpc API, the same symptom does not appear.
If synchronization is in progress and then stopped, only the following log is output, and synchronization-related logs are not output. If I run the aptos process again, the sync will proceed normally again and then stop again.
Stack trace/error message
Expected Behavior
It is necessary to determine what differences between versions 1.12.1 and 1.13.1 are causing the synchronization stoppage.
System information
Please complete the following information:
Additional context
I encountered an issue after upgrading to a new mainnet release. The same symptom does not appear in version 1.12.1.