Neptune-Crypto / neptune-core

anonymous peer-to-peer cash
Apache License 2.0
25 stars 7 forks source link

node crash 2023-10-27 #67

Closed aszepieniec closed 11 months ago

aszepieniec commented 11 months ago

My node crashed on 2023-10-27. This was the tail of the log:

2023-10-27T18:52:51.365480989Z  INFO ThreadId(02) neptune_core::connect_to_peers: Established incoming TCP connection with 167.94.145.59:39700
2023-10-27T18:52:52.365545083Z ERROR ThreadId(03) neptune_core::main_loop: Got error: Connection reset by peer (os error 104)
2023-10-27T18:52:52.380220408Z  INFO ThreadId(03) neptune_core::connect_to_peers: Established incoming TCP connection with 167.94.145.59:38792
2023-10-27T18:52:52.380331862Z ERROR ThreadId(02) neptune_core::main_loop: Got error: frame size too big
2023-10-27T18:52:52.406999133Z  INFO ThreadId(02) neptune_core::connect_to_peers: Established incoming TCP connection with 167.94.145.59:40178
2023-10-27T18:52:53.407199313Z ERROR ThreadId(03) neptune_core::main_loop: Got error: frame size too big
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 107, kind: NotConnected, message: "Transport endpoint is not connected" }', /home/alan/neptune-core/src/main_loop.rs:1040:59
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'tokio-runtime-worker' panicked at 'Failed to read from main loop: channel closed', /home/alan/neptune-core/src/peer_loop.rs:973:35
thread 'tokio-runtime-worker' panicked at 'Failed to read from main loop: channel closed', /home/alan/neptune-core/src/peer_loop.rs:973:35
2023-10-27T18:52:53.438370695Z ERROR ThreadId(03) neptune_core::connect_to_peers: Peer thread (incoming) for 12.88.153.42:54738 panicked. Invoking close connection callback
2023-10-27T18:52:53.438411515Z DEBUG ThreadId(03) neptune_core::connect_to_peers: Fetched peer info standing for 12.88.153.42:54738
2023-10-27T18:52:53.438444206Z DEBUG ThreadId(03) neptune_core::connect_to_peers: Stored peer info standing for 12.88.153.42:54738
2023-10-27T18:52:53.438476337Z  INFO ThreadId(03) neptune_core::mine_loop: Mining thread got message from main
thread 'tokio-runtime-worker' panicked at 'Error in mining thread: Miner failed to read from watch channel

Caused by:
    channel closed', /home/alan/neptune-core/src/lib.rs:177:14
2023-10-27T18:52:53.439212884Z ERROR ThreadId(02) neptune_core::connect_to_peers: Peer thread (incoming) for 51.15.139.238:38364 panicked. Invoking close connection callback
2023-10-27T18:52:53.439246885Z DEBUG ThreadId(02) neptune_core::connect_to_peers: Fetched peer info standing for 51.15.139.238:38364
2023-10-27T18:52:53.439276125Z DEBUG ThreadId(02) neptune_core::connect_to_peers: Stored peer info standing for 51.15.139.238:38364
Sword-Smith commented 11 months ago

The error seems to be due to this line let peer_address = stream.peer_addr().unwrap(); in

// Handle incoming connections from peer
                Ok((stream, peer_address)) = self.incoming_peer_listener.accept() => {
                    let state = self.global_state.clone();
                    let main_to_peer_broadcast_rx_clone: broadcast::Receiver<MainToPeerThread> = self.main_to_peer_broadcast_tx.subscribe();
                    let peer_thread_to_main_tx_clone: mpsc::Sender<PeerThreadToMain> = self.peer_thread_to_main_tx.clone();
                    // let peer_address = stream.peer_addr().unwrap();
                    let own_handshake_data: HandshakeData = state.get_own_handshakedata().await;
                    let incoming_peer_thread_handle = tokio::spawn(async move {
                        match answer_peer_wrapper(
                            stream,
                            state,
                            peer_address,
                            main_to_peer_broadcast_rx_clone,
                            peer_thread_to_main_tx_clone,
                            own_handshake_data,
                        ).await {
                            Ok(()) => (),
                            Err(err) => error!("Got error: {:?}", err),
                        }
                    });
                    main_loop_state.thread_handles.push(incoming_peer_thread_handle);
                    main_loop_state.thread_handles.retain(|th| !th.is_finished());
                }

If the connection is established and immediately broken, I guess that stream.peer_addr() returns None. We can just get peer_address from the accept() method's return value instead.

aszepieniec commented 11 months ago

And, upon restarting, the node crashes again with the log ending in this:

2023-11-01T11:29:07.221667976Z DEBUG ThreadId(01) neptune_core::models::state::wallet::wallet_state: Block has 0 removal records
2023-11-01T11:29:07.221707216Z DEBUG ThreadId(01) neptune_core::models::state::wallet::wallet_state: Transaction has 0 inputs
2023-11-01T11:29:07.221759008Z DEBUG ThreadId(01) neptune_core::models::state::wallet::wallet_state: Number of mutated membership proofs: 0
2023-11-01T11:29:07.429363168Z DEBUG ThreadId(01) neptune_core::models::state::wallet::wallet_state: Number of unspent UTXOs: 2656
2023-11-01T11:29:07.586939683Z DEBUG ThreadId(01) neptune_core::main_loop: Flushed all databases
2023-11-01T11:29:07.587078226Z DEBUG ThreadId(01) neptune_core::main_loop: Timer: block-synchronization job
2023-11-01T11:29:07.587094596Z  INFO ThreadId(01) neptune_core::main_loop: Running sync
2023-11-01T11:29:07.587110247Z  WARN ThreadId(01) neptune_core::main_loop: Could not read current block. Aborting block synchronization

I had to kill the process with kill -9.

Sword-Smith commented 11 months ago

a0cd20e0bc09d66315c93bc90e4ee2c2c9f55929 Should fix the original issue. Could you make a new issue for the other problem?