cBournhonesque / lightyear

A networking library to make multiplayer games for the Bevy game engine
https://cbournhonesque.github.io/lightyear/book
Apache License 2.0
499 stars 51 forks source link

Attempting a graceful shutdown is strange #704

Open simbleau opened 5 days ago

simbleau commented 5 days ago

I've wired up the server so that Ctrl+C will trigger a graceful shutdown, which means:

That plugin looks like this:

pub struct ShutdownPlugin;

impl Plugin for ShutdownPlugin {
    fn build(&self, app: &mut App) {
        let shutdown_flag = Arc::new(AtomicBool::new(false));
        ctrlc::set_handler({
            let shutdown_flag = shutdown_flag.clone();
            move || {
                info!("Ctrl-C detected, initiating shutdown...");
                shutdown_flag.store(true, Ordering::SeqCst);
            }
        })
        .expect("Error setting Ctrl-C handler");

        app.insert_resource(CtrlCPressed(shutdown_flag))
            .add_systems(Update, monitor_shutdown)
            .add_systems(OnEnter(ServerState::Exiting), exit_routine);
    }
}

#[derive(Resource, Deref, DerefMut)]
struct CtrlCPressed(Arc<AtomicBool>);

fn monitor_shutdown(
    ctrlc_pressed: Res<CtrlCPressed>,
    mut next_state: ResMut<NextState<ServerState>>,
) {
    if ctrlc_pressed.load(Ordering::SeqCst) {
        next_state.set(ServerState::Exiting);
    }
}

fn exit_routine(
    mut manager: ResMut<ServerConnections>,
    query_players: Query<(&PlayerId, &PlayerData)>,
) {
    info!("Stopping listening...");
    manager.stop().unwrap_or_else(|e| {
        error!("Failed to stop server: {e:?}");
    });
    info!("Disconnecting all players...");
    let mut players = vec![];
    for (player_id, player_data) in query_players.iter() {
        info!("Disconnecting {}", player_id.0);
        players.push(player_data.to_owned());
        manager.disconnect(player_id.0).unwrap_or_else(|e| {
            error!("Failed to disconnect {}: {e:?}", player_id.0);
        });
    }
    info!("TODO: Synchronize with database");
    std::process::exit(0);
}

My output is strange, am I doing this wrong?

2024-11-23T20:23:50.207058Z  INFO woot_server::shutdown::plugin: Ctrl-C detected, initiating shutdown...
2024-11-23T20:23:50.208833Z  INFO woot_server::shutdown::plugin: Stopping listening...

2024-11-23T20:23:50.209069Z ERROR woot_server::shutdown::plugin: Failed to stop server: Transport(Channel("sending into a closed channel"))

2024-11-23T20:23:50.209080Z  INFO woot_server::shutdown::plugin: Disconnecting all players...
2024-11-23T20:23:50.209090Z  INFO woot_server::shutdown::plugin: Disconnecting Netcode(11413710467502064786)

2024-11-23T20:23:50.208970Z ERROR lightyear::connection::netcode::server: server failed to send disconnect packet: unable to send message to client: channel closed
2024-11-23T20:23:50.208994Z ERROR lightyear::connection::netcode::server: server failed to send disconnect packet: unable to send message to client: channel closed
2024-11-23T20:23:50.209014Z ERROR lightyear::connection::netcode::server: server failed to send disconnect packet: unable to send message to client: channel closed
2024-11-23T20:23:50.209033Z ERROR lightyear::connection::netcode::server: server failed to send disconnect packet: unable to send message to client: channel closed

2024-11-23T20:23:50.209099Z  INFO woot_server::shutdown::plugin: TODO: Synchronize with database
Nul-led commented 5 days ago

Looks like timing issue, the server fully terminates the connection before it can gracefully close it

simbleau commented 4 days ago

Looks like timing issue, the server fully terminates the connection before it can gracefully close it

I don't understand your feedback.

Do you mean I have to disconnect all clients before stopping?

Do you mean OnEnter(State) is not the right time?

Nul-led commented 4 days ago

Looks like timing issue, the server fully terminates the connection before it can gracefully close it

I don't understand your feedback.

Do you mean I have to disconnect all clients before stopping?

Do you mean OnEnter(State) is not the right time?

Ah no, i just thought this probably is a fault in lightyears disconnect logic where if you call disconnect() it just terminates the client connecting including the sender/receiver channel for packets before its able to send the disconnect packet, @cBournhonesque probably knows more about what might be happening here

Nul-led commented 4 days ago

Actually on second thought server stop very likely means automatically disconnect all clients so you probably dont need to disconnect them manually

simbleau commented 4 days ago

Actually on second thought server stop very likely means automatically disconnect all clients so you probably dont need to disconnect them manually

It's ServerConnections.stop which is documented as just flipping is_listening to false

Looks like timing issue, the server fully terminates the connection before it can gracefully close it

I don't understand your feedback. Do you mean I have to disconnect all clients before stopping? Do you mean OnEnter(State) is not the right time?

Ah no, i just thought this probably is a fault in lightyears disconnect logic where if you call disconnect() it just terminates the client connecting including the sender/receiver channel for packets before its able to send the disconnect packet, @cBournhonesque probably knows more about what might be happening here

This would make sense. Hoping Charles can chime in.

cBournhonesque commented 3 days ago

I think calling ServerConnections.stop() would indeed work; it calls stop() for each of the NetServer, which in turn send disconnecting messages to each of the client.

I wouldn't worry too much about the error messages (even though it would be great if they were fixed). For example to make sure that the disconnection is received correctly I send 10 disconnect packages in a row. It's possible that the first one is received very quickly and the connection is somehow stopped, and then when the others are received you just get server failed to send disconnect packet: unable to send message to client: channel closed

Nul-led commented 3 days ago

@cBournhonesque maybe generally it makes sense to have this as a warn instead of an error. In the specific case of sending disconnect packets after close / termination its probably not needed at all though.