actix / actix-net

A collection of lower-level libraries for composable network services.
https://actix.rs
Apache License 2.0
713 stars 349 forks source link

unix-fd listener should not remove sockets on deregister() #364

Open ensc opened 3 years ago

ensc commented 3 years ago

actix-server runs https://github.com/actix/actix-net/blob/605ec251432786b169490aac60679388aa902d95/actix-server/src/socket.rs#L74-L84

This deregister function can be called as part of flow control as response to Pause command https://github.com/actix/actix-net/blob/605ec251432786b169490aac60679388aa902d95/actix-server/src/accept.rs#L286-L292

When resuming operations, the socket file is gone and clients can not connect to the server anymore.

The cleanup should happen only when connection is closed finally (Stop command?)

fakeshadow commented 3 years ago

https://github.com/actix/actix-net/issues/97

ensc commented 3 years ago

My use case for unix fds are CI tests: Instead of listening on tcp sockets and implementing ways to find free port numbers, I use unix sockets. Then, I can call the server like

curl --unix-socket /tmp/foo http://server.example.com

While doing tests, somehow the Pause event is triggered (I am not sure why; perhaps because I use SIGSTOP on the server between test steps), and things stop to work after

  1. curl --unix-socket /tmpfoo --> ok

  2. actix-net removes /tmp/foo

  3. curl --unix-socket /tmpfoo --> fails because file does not exist anymore

My workaround for now is to let actix-net listen on a temporary socket and rename it later to the real one.

hommeabeil commented 3 years ago

I got the same issue with a really simple server which listen on unix socket. Form it it when we go through the backpressure method, which deregister the socket. Here is a backtrace taken with gdb

#0  std::fs::remove_file (path=0x7ffff7787846) at /usr/src/debug/libstd-rs/1.49.0-r0/rustc-1.49.0-src/library/std/src/fs.rs:1529
#1  0x00005555558b0e0d in <actix_server::socket::SocketListener as mio::event_imp::Evented>::deregister (self=0x7fffec000d80, poll=0x7ffff7788e30)
    at /usr/src/debug/sx-sts/0.1.0-r0/cargo_home/bitbake/actix-server-1.0.4/src/socket.rs:137
#2  0x00005555558fac89 in mio::poll::Poll::deregister (self=0x7ffff7788e30, handle=0x7fffec000d80)
    at /usr/src/debug/sx-sts/0.1.0-r0/cargo_home/bitbake/mio-0.6.23/src/poll.rs:910
#3  0x00005555558a80bd in actix_server::accept::Accept::backpressure (self=0x7ffff7788e30, on=true)
    at /usr/src/debug/sx-sts/0.1.0-r0/cargo_home/bitbake/actix-server-1.0.4/src/accept.rs:383
#4  0x00005555558a86f5 in actix_server::accept::Accept::accept_one (self=0x7ffff7788e30, msg=...)
    at /usr/src/debug/sx-sts/0.1.0-r0/cargo_home/bitbake/actix-server-1.0.4/src/accept.rs:437
#5  0x00005555558a979b in actix_server::accept::Accept::accept (self=0x7ffff7788e30, token=0)
    at /usr/src/debug/sx-sts/0.1.0-r0/cargo_home/bitbake/actix-server-1.0.4/src/accept.rs:475
#6  0x00005555558a62ca in actix_server::accept::Accept::poll (self=0x7ffff7788e30)
    at /usr/src/debug/sx-sts/0.1.0-r0/cargo_home/bitbake/actix-server-1.0.4/src/accept.rs:257
#7  0x00005555558a5071 in actix_server::accept::Accept::start::{{closure}} ()
    at /usr/src/debug/sx-sts/0.1.0-r0/cargo_home/bitbake/actix-server-1.0.4/src/accept.rs:170
#8  0x00005555558cb093 in std::sys_common::backtrace::__rust_begin_short_backtrace (f=...)
    at /usr/src/debug/libstd-rs/1.49.0-r0/rustc-1.49.0-src/library/std/src/sys_common/backtrace.rs:125
#9  0x00005555558ddb95 in std::thread::Builder::spawn_unchecked::{{closure}}::{{closure}} ()
    at /usr/src/debug/libstd-rs/1.49.0-r0/rustc-1.49.0-src/library/std/src/thread/mod.rs:474
#10 0x00005555558a9e85 in <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once (self=..., _args=())
    at /usr/src/debug/libstd-rs/1.49.0-r0/rustc-1.49.0-src/library/std/src/panic.rs:322
#11 0x00005555558cdb33 in std::panicking::try::do_call (data=0x7ffff77894c8 "\000")
    at /usr/src/debug/libstd-rs/1.49.0-r0/rustc-1.49.0-src/library/std/src/panicking.rs:381
#12 0x00005555558d8cad in __rust_try ()
#13 0x00005555558cc4d4 in std::panicking::try (f=...) at /usr/src/debug/libstd-rs/1.49.0-r0/rustc-1.49.0-src/library/std/src/panicking.rs:345
#14 0x00005555558aa275 in std::panic::catch_unwind (f=...) at /usr/src/debug/libstd-rs/1.49.0-r0/rustc-1.49.0-src/library/std/src/panic.rs:396
#15 0x00005555558dd95e in std::thread::Builder::spawn_unchecked::{{closure}} ()
    at /usr/src/debug/libstd-rs/1.49.0-r0/rustc-1.49.0-src/library/std/src/thread/mod.rs:473
#16 0x000055555589a6fe in core::ops::function::FnOnce::call_once{{vtable-shim}} ()
    at /usr/src/debug/libstd-rs/1.49.0-r0/rustc-1.49.0-src/library/core/src/ops/function.rs:227
#17 0x0000555555dc728a in std::sys::unix::thread::Thread::new::thread_start ()
#18 0x00007ffff7edee24 in start_thread (arg=<optimized out>) at pthread_create.c:477
#19 0x00007ffff7cc9e9f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

What can we do about it ? Is simply removing the fs::remove_file should work ? My guess is we will be leaking the file in a clean exit ?