8xFF / atm0s-media-server

Decentralized, Global-Scale Media Server written in Rust (WebRTC/Whip/Whep/Rtmp/Sip)
https://8xff.github.io/media-docs/
MIT License
214 stars 17 forks source link

bug: server crash on endpoint destroyed #440

Open giangndm opened 6 days ago

giangndm commented 6 days ago

Description

It very hard to reproduce but it occurs to me

10/11/2024 2:24:08 PMthread '<unnamed>' panicked at /home/runner/work/8xFF-decentralized-media-server/8xFF-decentralized-media-server/packages/media_core/src/cluster/room/metadata.rs:358:9:
10/11/2024 2:24:08 PMassertion `left == right` failed: Peers not empty
10/11/2024 2:24:08 PM  left: 1
10/11/2024 2:24:08 PM right: 0
10/11/2024 2:24:08 PMstack backtrace:
10/11/2024 2:24:08 PM2024-10-11T07:24:08.679900Z  INFO atm0s_media_server::server::media: on req 50 res from worker 0
10/11/2024 2:24:08 PM   0: rust_begin_unwind
10/11/2024 2:24:08 PM   1: core::panicking::panic_fmt
10/11/2024 2:24:08 PM   2: core::panicking::assert_failed_inner
10/11/2024 2:24:08 PM   3: core::panicking::assert_failed
10/11/2024 2:24:08 PM   4: <media_server_core::cluster::room::metadata::RoomMetadata<Endpoint> as core::ops::drop::Drop>::drop
10/11/2024 2:24:08 PM   5: core::ptr::drop_in_place<media_server_core::cluster::room::metadata::RoomMetadata<media_server_runner::worker::MediaClusterEndpoint>>
10/11/2024 2:24:08 PM   6: core::ptr::drop_in_place<media_server_core::cluster::room::ClusterRoom<media_server_runner::worker::MediaClusterEndpoint>>
10/11/2024 2:24:08 PM   7: sans_io_runtime::task::group::TaskGroup<In,Out,T,_>::remove_task
10/11/2024 2:24:08 PM   8: <media_server_core::cluster::MediaCluster<Endpoint> as sans_io_runtime::task::switcher::TaskSwitcherChild<media_server_core::cluster::Output<Endpoint>>>::pop_output
10/11/2024 2:24:08 PM   9: media_server_runner::worker::MediaServerWorker<ES>::pop_output
10/11/2024 2:24:08 PM  10: <atm0s_media_server::server::media::runtime_worker::MediaRuntimeWorker<ES> as sans_io_runtime::worker::WorkerInner<media_server_runner::worker::Owner,atm0s_media_server::server::media::runtime_worker::ExtIn,atm0s_media_server::server::media::runtime_worker::ExtOut,atm0s_media_server::server::media::runtime_worker::Channel,atm0s_sdn_network::worker::SdnWorkerBusEvent<media_server_runner::worker::UserData,media_server_runner::worker::SC,media_server_runner::worker::SE,(),()>,atm0s_media_server::server::media::runtime_worker::ICfg<ES>,()>>::pop_output
10/11/2024 2:24:08 PM  11: sans_io_runtime::worker::Worker<Owner,ExtIn,ExtOut,ChannelId,Event,Inner,ICfg,SCfg,B,_>::pop_inner

Expected Behavior

Don't crash

Actual Behavior

Crash

giangndm commented 6 days ago

It seems to have some case where cluster room_meta pop_out a Destroy event but it still in queue, after that cluster put another join event. It make room_meta turn into a edge-case.

Solution: maybe tasks shouldn't push Destroy event into it output queue, instead of that it will store it state as Destroying,

giangndm commented 4 days ago

Have more crash cases: crash1.log crash2.log