Luminarys / synapse

Synapse BitTorrent Daemon
https://synapse-bt.org
ISC License
855 stars 48 forks source link

Unwrapping None while attempting a clean shutdown #186

Open evanrichter opened 4 years ago

evanrichter commented 4 years ago

After ctrl-c, I get caught on this unwrap: https://github.com/Luminarys/synapse/blob/f89a9c44e5c0c7e3fbf3c00e2ba954f00e09003f/src/rpc/processor.rs#L620

Stack trace below. Any ideas on where to track down this bug? Or is it just an unwrap that should be an if let Some(r) = ...?

05/24/20 21:50:51 [synapse::init:128] I: Shutting down cleanly. Interrupt again to shut down immediately.
thread 'rpc' panicked at 'called `Option::unwrap()` on a `None` value', src/rpc/processor.rs:624:33
stack backtrace:
   0: backtrace::backtrace::libunwind::trace
             at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.46/src/backtrace/libunwind.rs:86
   1: backtrace::backtrace::trace_unsynchronized
             at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.46/src/backtrace/mod.rs:66
   2: std::sys_common::backtrace::_print_fmt
             at src/libstd/sys_common/backtrace.rs:78
   3: <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt
             at src/libstd/sys_common/backtrace.rs:59
   4: core::fmt::write
             at src/libcore/fmt/mod.rs:1069
   5: std::io::Write::write_fmt
             at src/libstd/io/mod.rs:1504
   6: std::sys_common::backtrace::_print
             at src/libstd/sys_common/backtrace.rs:62
   7: std::sys_common::backtrace::print
             at src/libstd/sys_common/backtrace.rs:49
   8: std::panicking::default_hook::{{closure}}
             at src/libstd/panicking.rs:198
   9: std::panicking::default_hook
             at src/libstd/panicking.rs:218
  10: std::panicking::rust_panic_with_hook
             at src/libstd/panicking.rs:511
  11: rust_begin_unwind
             at src/libstd/panicking.rs:419
  12: core::panicking::panic_fmt
             at src/libcore/panicking.rs:111
  13: core::panicking::panic
             at src/libcore/panicking.rs:54
  14: core::option::Option<T>::unwrap
             at /rustc/b2e36e6c2d229126b59e892c9147fbb68115d292/src/libcore/macros/mod.rs:34
  15: synapse::rpc::processor::Processor::handle_ctl
             at src/rpc/processor.rs:624
  16: synapse::rpc::RPC::handle_ctl
             at src/rpc/mod.rs:259
  17: synapse::rpc::RPC::run
             at src/rpc/mod.rs:239
  18: synapse::rpc::RPC::start::{{closure}}
             at src/rpc/mod.rs:207
  19: synapse::handle::Handle<I,O>::run::{{closure}}
             at src/handle.rs:39
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
05/24/20 21:50:52 [synapse::init:41] E: Unclean shutdown detected, terminating
Luminarys commented 4 years ago

I think it's useful to keep that unwrap there, or at least report an error if it's handled via if-let.

Tracking this down isn't so simple since it requires examining every codepath that adds a resource to the RPC handler and then thinking about how a shutdown could trigger an out of order delete before add (or just failure to add). Is this a bug you see frequently occurring?

evanrichter commented 4 years ago

it happens every time currently. it must be serialized in a cache file then? I'll see about printing more information during the None case

Luminarys commented 4 years ago

This helps with tracking things down then, maybe try printing the ID and then in all the rpc_id generating functions match over that ID and see where it's coming from.

evanrichter commented 4 years ago

Ok I have more information now that I'm printing the non-present ids:

05/25/20 21:51:36 [synapse::init:128] I: Shutting down cleanly. Interrupt again to shut down immediately.
DBG: id not found: "D3EB14FAF8EADF95AB1E00C588F5A9E47A384CEC"
DBG: self.resources.len() 416
DBG: id not found: "7041EEC6BA6F687BDE3F2641D50097F1E2B7F0AE"
DBG: self.resources.len() 416
DBG: id not found: "2966614EDB1EF630E251664D52BDBA7EB6A522DD"
DBG: self.resources.len() 416
DBG: id not found: "26D3E02B33079FE14825D6BC5773B210FB589876"
DBG: self.resources.len() 216
DBG: id not found: "2DE4731620D5E44F55DA77849A069EEC21AEC575"
DBG: self.resources.len() 216
DBG: id not found: "3B2A920A1EF7664A08AEB8299AFE449832DCC527"
DBG: self.resources.len() 216
DBG: id not found: "5B15FFC79AED3C26B650747C6BFC6B1E99523AE0"
DBG: self.resources.len() 31
DBG: id not found: "DA8AFC017D7A1BDB0F128259991DCC78FB3073CF"
DBG: self.resources.len() 31
DBG: id not found: "07098728BD8EB60DAD3773D0EA2BBF4C6D5D9A5A"
DBG: self.resources.len() 31
DBG: id not found: "A006F7F78A5CAA5EBB13561454DF7C29692F14FD"
DBG: self.resources.len() 22
DBG: id not found: "06C3E555F55C295948E788DC6F55274EBCFA0045"
DBG: self.resources.len() 22
DBG: id not found: "D71C9FBB0BDEBAF020371C9BD48616E6F8E7C60F"
DBG: self.resources.len() 22
05/25/20 21:51:39 [synapse::init:45] I: Shutdown complete

Looks like there's always 3 failures in a row. The first failure in each set of 3 is a valid torrent that I have seeding. I don't know what the other two are.

Also, I get a different set of IDs each run. This seems like a race condition of some sort, where perhaps another thread has dumped the same resource

evanrichter commented 4 years ago

here's the call that is sending the Removed request: https://github.com/Luminarys/synapse/blob/edbd189dd02fa94b8363ca5bef15a0875105312c/src/torrent/mod.rs#L1636

Luminarys commented 4 years ago

Looking into this it seems like either a file or tracker wasn't properly added. Can you add some tracing to the file_rpc_id/tracker_rpc_id functions and see if they're called with those hashes. My guess is that it's files since there's a conditional clause around magnets for it, but looking through that logic I don't see what's wrong. Can you try to identify the torrent that causes this and give some basic information about its state?

evanrichter commented 4 years ago

I added a trace to send_rpc_removal to see what it was requesting to remove, and it is sending a removal request twice. See this trace, this set of 3 was removed successfully, then about 20 requests later sent again.

DBG: sending rpc msg: removing ["1B78D8A59F99F137E76CD57DFCCB3263D366FC1C", "78A81F7C24D41FFEF8553B57BA071C98523AE4A2", "1CD1D6C1B28D6F4A7E1BBDF6DB28074353EA89C0"]
<snip>
DBG: sending rpc msg: removing ["1B78D8A59F99F137E76CD57DFCCB3263D366FC1C", "78A81F7C24D41FFEF8553B57BA071C98523AE4A2", "1CD1D6C1B28D6F4A7E1BBDF6DB28074353EA89C0"]
DBG: id not found: "1B78D8A59F99F137E76CD57DFCCB3263D366FC1C"
DBG: self.resources.len() 138
DBG: id not found: "78A81F7C24D41FFEF8553B57BA071C98523AE4A2"
DBG: self.resources.len() 138
DBG: id not found: "1CD1D6C1B28D6F4A7E1BBDF6DB28074353EA89C0"
DBG: self.resources.len() 138
Luminarys commented 4 years ago

I wonder how it's possible that this is being dropped twice since that's the only time this code is called. I'll have to do some more thinking, but I recommend you add some more tracing around the object to get more context around how this is happening.

evanrichter commented 4 years ago

So I was looking at this again yesterday, and couldn't reproduce the issue. Now I'm really not sure how to proceed!

It could have been induced by a tracker not giving expected input, or a peer, or anything else.

At any rate, I'm not having unclean shutdown issues anymore.