Closed ysbaddaden closed 1 month ago
Urgh, supporting fork
is painful. I must close the timerfds for each fiber (easy) but then I must mutate all pending events to use the new fd :weary:
I took care to use EPOLLEXCLUSIVE (available since 2017). I'm not sure we need EPOLLONESHOT since we register the fd once then dequeue readers/writers one by one. Now, there may still be races, especially in a MT environment, but there's little we can do without blocking the other readers/writers (there's no telling if a reader/writer will keep reading/writing).
I didn't know the close
quirk. That explains why we unregister the file descriptors before we actually close. Now, if a fd is in multiple epoll instances, oops, this is likely already broken today :fearful: this is working today: we remove the events and the fd from all event bases that referenced the IO object.
The close quirk needs some experimental verification:
Q6 Will the close of an fd cause it to be removed from all epoll sets automatically?
A6 Yes.
The one thing to take care of would be to resume the cached event/fibers from all epoll instances (tricky).
I also found an issue with EPOLLEXCLUSIVE: we can't modify, so either we don't use it or we must del/add when changing the set of events.
The close quirk
Could be they fixed it at some point, there has been a bunch of years since that post was written after all..
Another man page, a longer answer:
Yes, but be aware of the following point. A file descriptor is a reference to an open file description (see open(2)). Whenever a descriptor is duplicated via dup(2), dup2(2), fcntl(2) F_DUPFD, or fork(2), a new file descriptor referring to the same open file de- scription is created. An open file description continues to exist until all file descriptors referring to it have been closed. A file descriptor is removed from an epoll set only after all the file descriptors referring to the underlying open file description have been closed (or before if the descriptor is explicitly removed using epoll_ctl() EPOLL_CTL_DEL). This means that even after a file descriptor that is part of an epoll set has been closed, events may be reported for that file descriptor if other file de- scriptors referring to the same underlying file description remain open.
https://man.freebsd.org/cgi/man.cgi?query=epoll&apropos=0&sektion=0&manpath=SuSE
The preview_mt
issues are caused by thread B closing a fd
while thread A is waiting; since each thread has its own eventloop this is creating issues, I end up not cleaning up the fd
from thread A' epoll instance.
I'm trying to think of an efficient way to iterate the event loops on (something lighter than ThreadLocalValue
), but may resort to iterate all EL for starters (and check if it fixes the MT issues that fixes the issue).
Perhaps it is possible to have a lookup table fd -> loop (or probably - a bit less contentious in mt scenario - have a list in the FileDescriptor (well no, but you perhaps better understand what I mean compared to other choices of words) that keeps track of what epolls uses it). But it is an icky problem :(
It looks like it's starting to work... except for the interpreter that now hangs forever :sob:
Using a release compiler (with libevent), the interpreter segfaults :sob:
Building a compiler with epoll (as CI does) in non release, the interpreter enters a kind of infinite loop (100% CPU, silent strace
) after printing the first spec dot (.
). The last strace logs are:
epoll_create1(EPOLL_CLOEXEC) = 13
eventfd2(0, EFD_CLOEXEC) = 14
timerfd_create(CLOCK_MONOTONIC, TFD_CLOEXEC) = 15
epoll_ctl(13, EPOLL_CTL_ADD, 14, {EPOLLIN, {u32=1958253120, u64=140490338505280}}) = 0
epoll_ctl(13, EPOLL_CTL_ADD, 15, {EPOLLIN, {u32=1958253072, u64=140490338505232}}) = 0
write(11, "\33[", 2
rite(11, "32", 232) = 2
write(11, "m", 1m) = 1
write(11, ".", 1.) = 1
write(11, "\33[", 2
rite(11, "0", 10) = 1
write(11, "m", 1m) = 1
It's creating the eventloop instance (epoll, eventfd, timerfd) then writes a colored .
then nothing.
Any pointers to debug the interpreter itself?
I can't get tracing to work in the interpreted code, so I hacked myself into Crystal.trace
and I see an infinite list of sched.event_loop
traces :raised_eyebrow:
I only checked whether @events
was empty and didn't check if @timers
was, too :facepalm:
CI is finally green and the implementation can be reviewed!
how to roll out
Either opt-in with a flag... or we're bold and make it the default + a flag to opt-out. That will depend on how stable/performant it will get until the next release.
Moving back to draft. There are some fixes in #14829 and... I got great ideas from the Go implementation that would allow to skip most of the synchronizations and take full advantage of epoll/kqueue (see https://github.com/ysbaddaden/execution_context/issues/30).
Superseded by #14959
Implements a custom event loop on top of
epoll
for Linux (and Solaris?), following RFC #7. It's not particularly optimized yet (e.g. it still allocates), but seems to be working.The EventQueue and Timers types aren't the best solutions (performance wise) but they work and abstract their own logic (keep a list of events per fd / keep a sorted list of timers). We should be able to improve (e.g. linked lists -> ordered linked lists -> skiplists, ...)
ISSUES
(not needed with one timerfd per eventloop, see below);#after_fork
must close & nullify everyFiber#timerfd
& update pending events to use the new fdmust reset mutex after fork before exec (preview mt);must "close" fd in all eventloops (preview mt);LibC bindings foraarch-linux-musl
are wrong;investigate segfault runs on CI;support:preview_mt
(broken CI);can't modify a registered fd with EPOLLEXCLUSIVE set(disabled for now);BUG: a timerfd file descriptor errored or got closed!
on CI;epoll_ctl(EPOLL_CTL_MOD): No such file or directory (RuntimeError)
on CI;Unhandled exception: pthread_mutex_unlock: Operation not permitted (RuntimeError)
on CI and againinfinite loop in interpreter on CI (primitives spec);OPTIMIZE
epoll_event.data
should point to Node instead of fd, so we could skip searches;eventfd
instead ofpipe2
to interrupt the loop;timerfd
per eventloop should be enough (wait until next timer => we can't enqueue another timer on that thread anymore => for MT a parallel enqueue will interruptepoll_wait
); we still need atimerfd
becauseepoll_wait
only has millisecond precision; we also need a list of timers to know the next timer, and which timers are ready.TODO
Fiber#cancel_timeout
(distinct PR);:ev_epoll
flag instead of forcing it for Linux (not yet: I want CI test runs);