anuragsoni / poll

Portable OCaml interface to macOS/Linux/Windows native IO event notification mechanisms
https://anuragsoni.github.io/poll/poll
MIT License
31 stars 1 forks source link

Needs for EPOLLET #13

Closed craff closed 1 year ago

craff commented 1 year ago

I am trying poll for simple_httpd and failed.

One of the reason is I have several OCaml domains polling on the same socket to accept connection (which works very well with select)

For this to work, I need the EPOLLET flag. For other sockets, I want level trigerred poll because I do not want to read all available data untile read/write anser AGAIN.

So I would need access to the EPOLLET flag in Poll.set

Clearly the question is : is there an equivalent flag on the other platform. But the other platform could ignore the flag ?

craff commented 1 year ago

Looking at the code, I also see you use ONESHOT by default. I could manage that on read/write socket, but better not on socket where I wait for accept. But taking this in acount does not solve my problem.

So the best would be to have access to both flags for epoll.

craff commented 1 year ago

I managed to have simple_httpd working with polly (which is a direct interface only supporting linux epoll) but not with your software. From what I tried, I think there is something I don't understand, because even without EPOLLET and with ONSHOT everywhere it should work.

If you want to have a look, simple_httpd has 3 branches :

The code using select/poll/polly is in src Simple_httpd_domain.ml in only one function.

Doing make, does some test that never ends with poll while working with the 2 others.

anuragsoni commented 1 year ago

Clearly the question is : is there an equivalent flag on the other platform. But the other platform could ignore the flag ?

Kqueue supports edge triggered events as well so at-least on BSD like systems an edge triggered behavior could be supported. At the moment, I pick one shot as the lowest common denominator as on windows I don't have access to the EPOLLET flag in the wepoll library.

From what I tried, I think there is something I don't understand, because even without EPOLLET and with ONSHOT everywhere it should work.

I'd expect something workable with ONESHOT events as well. I'll try and take a look at your project soon to see what's going on with poll. Thanks for preparing a reproduction case!

anuragsoni commented 1 year ago

I need to spend more time on this to confirm, but I wouldn't be surprised if the problem is related to the epoll bindings in this library trying to be "smart" by keeping track of the existing state of each file descriptor's flags that are registered with it. To do this the library maintains internal state in a hash table (https://github.com/anuragsoni/poll/blob/main/src/epoll_poll.ml#L66), and if a fd isn't unregistered via Poll.set poll fd Poll.Event.none it will lead to issues because this internal hash table will never remove the entry recorded for the fd. Similarly sharing the poll descriptor among multiple threads is most likely going to cause issues without edge triggered modes, so in my multicore explorations I've ended up with a workflow of assigning a unique poll descriptor per domain (https://github.com/anuragsoni/sandbox/blob/c8e3c122510a1f826f9786a7aca3027ab6d4ef4a/ocaml/async_io/src/local_run_queue.ml). None of this is to say that this library can't be updated to help resolve this paper cuts though! I am not tied to having wepoll support in the library so I'd be open to removing that and focus on kqueue and epoll only if that ends up allowing a better api for bad/linux. Similarly, the internal state used to support set style use so the end-users don't need to maintain their own state to figure out whether to call add vs update can be revisited. This probably shows my bias towards kqueue as there is no need for a different call for add vs modify vs delete when it comes to updating the kernel state for the readiness interest, but it ends up being less confusing of an api for the end-users we could adapt the interface to better match what epoll does and have the kqueue implementation match that.

craff commented 1 year ago

Remark: I use personnal branch for http-cookie and ocaml-ssl. PR's are submitted, but if you want to try now, have a look at .github/worflow to see the correct opam pin command.

craff commented 1 year ago

I need to spend more time on this to confirm, but I wouldn't be surprised if the problem is related to the epoll bindings in this library trying to be "smart" by keeping track of the existing state of each file descriptor's flags that are registered with it.

That is something which is actually recommended on the man of epoll to solve some frequent problem. So this is not chocking.

To do this the library maintains internal state in a hash table (https://github.com/anuragsoni/poll/blob/main/src/epoll_poll.ml#L66), and if a fd isn't unregistered via Poll.set poll fd Poll.Event.none it will lead to issues because this internal hash table will never remove the entry recorded for the fd.

Similarly sharing the poll descriptor among multiple threads is most likely going to cause issues without edge triggered modes, so in my multicore explorations I've ended up with a workflow of assigning a unique poll descriptor per domain (https://github.com/anuragsoni/sandbox/blob/c8e3c122510a1f826f9786a7aca3027ab6d4ef4a/ocaml/async_io/src/local_run_queue.ml).

I don't do that and do not remember the use case for this that I saw somewhere. What I do is a socket on which I listen to accept connection is set in one distinct epoll on each domain. The problem is to avoid the risk of accepting the socket twice in two different domain. What I understood from the documentation is that this require EPOLLET, but I am not sure.

None of this is to say that this library can't be updated to help resolve this paper cuts though! I am not tied to having wepoll support in the library so I'd be open to removing that and focus on kqueue and epoll only if that ends up allowing a better api for bad/linux.

I think using GADT you could easily have option specific to each backend, without any possible confusion. This way you can have the best interface on each backend and keep a common denominator. Moreover, the user can write code specific for each backend. I would do in backend.ml

type _ t = ..
type epoll = [ `Epoll ]
type wepoll = [ `Wepoll ]
type kqueue = [ `Kquueu ]
type _ t += Kqueue : kqueue t | Epoll : epoll t | Wepoll : wepoll t

Then you could have flag of type _ Event.t (with the correct variance ...) for individual flag with some flag of type [Epoll |Kqueue] Event.t if there are common to two backend.

Then epoll would also be parametrised by the backend and set type would only allow the option for a specific backend, that the user can build by pattern matching on the backend type.

The only difficulty is to get variance and subtyping right. It you want I can give it a try.

But I will also try to monitor your hashtbl to see if the problem is related to that.

craff commented 1 year ago

A note on epoll: I think add is O(1) amortized (hashtbl ?) while upd is O(1) in all cases. This is why I don't think remove should be automatic. A good way to manage connection to a server is to add at accept, remove at close and upd during the treatment.

I think poll should allow this scheme when the backend is epoll. A way to do it is to have a flag "keep".

I looked at libevent, but the fact I can not write the mainloop myself rules it out for me. Your solution is probably the best lightweight solution but it needs more tests.

craff commented 1 year ago

When you tell me you view point, about the above messages, I will try to create clear separate issue and start to do pull request. I already added on for tests.

anuragsoni commented 1 year ago

This is why I don't think remove should be automatic. A good way to manage connection to a server is to add at accept, remove at close and upd during the treatment.

Remove isn't automatic even in the current state. Once an event is fired, the internal state for a fd for the epoll backend isn't removed till Poll.set poll fd Poll.Event.none is called. The problem arises if this operation doesn't take place for closed fds as then this state is never cleared. The intention (should be documented) is that since events are registered as one shot, the user on seeing a file descriptor in a poll event list should notice that its ready for I/O, and then keep using it till seeing EAGAIN or EOF. If they encounter EAGAIN they register interest again, and on EOF/closing they need to remember to set none so epoll's internal state is cleared. The current poll interface means that the Kqueue backend doesn't require this internal book-keeping, and whenever a FD is closed the fd is also removed from the kqueue descriptor by the kernel. For Epoll the fact that poll is managing an internal state can lead to some surprises since there are no ways to guarantee that the internal state gets cleared so its all based on users remembering to do that.

anuragsoni commented 1 year ago

I looked at libevent, but the fact I can not write the mainloop myself rules it out for me. Your solution is probably the best lightweight solution but it needs more tests.

I'm not 100% sure about your use-case, and I'm not sure you've seen these yet, but there are pretty nice OCaml bindings for libev available as well https://github.com/rgrinberg/lev which also cover more use-cases like timers/signals etc. It could also be of interest for your use-case!

craff commented 1 year ago

Remove isn't automatic even in the current state. Once an event is fired, the internal state for a fd for the epoll backend isn't removed till Poll.set poll fd Poll.Event.none is called.

I understood this, but I don't think it is the best choice. I suggest

The problem arises if this operation doesn't take place for closed fds as then this state is never cleared.

This is always a problem, you have to clear information about individual connection in many application. I don't think this library should care. The user should just need a way to remove a socket from the hashtbl even after the socket was closed.

Maybe some exception should trigger a check on all sockets and removal (if wait raises a Unix_error check all sockets with fstat, do a try Unix.close s; None with e -> Some e on all wrong sockets an reraise an exception containing the original exception and the list of closed socket with the optional exception generated by close. This way the library helps the user but no information is lost for the user to debug.

The intention (should be documented) is that since events are registered as one shot, the user on seeing a file descriptor in a poll event list should notice that its ready for I/O, and then keep using it till seeing EAGAIN or EOF. If they encounter EAGAIN they register interest again, and on EOF/closing they need to remember to set none so epoll's internal state is cleared.

This means you can not use epoll if you can't set oneshot/epollet manually for all (most?) application. See my suggestion for GADT.

The current poll interface means that the Kqueue backend doesn't require this internal book-keeping, and whenever a FD is closed the fd is also removed from the kqueue descriptor by the kernel. For Epoll the fact that poll is managing an internal state can lead to some surprises since there are no ways to guarantee that the internal state gets cleared so its all based on users remembering to do that.

With a careful implementation, we can prove that both state are always in sync. This is hard but possible, especially that epoll man page describe situations where it recommends keeping your own state.

craff commented 1 year ago

I'm not 100% sure about your use-case, and I'm not sure you've seen these yet, but there are pretty nice OCaml bindings for libev available as well https://github.com/rgrinberg/lev which also cover more use-cases like timers/signals etc. It could also be of interest for your use-case!

Thanks for the pointer... opam search libevent does not report libev! (only conf-libev). And indeed could work for me. I still think your implementation offers some advantages (one layer less).

Another problem for me (more general) is a lack of solution that use what is available to build the best possible mutex/semaphore/condition on which you can wait together with sockets. It is a pity that libevent does no provide this, even if you can implement it yourself .... with the high risk of not getting the semantics you want! Currently I wait with a low timeout (around 30ms) and then check all mutexes.

Typically in my system I have a mutex around session information available on all thread/domain to treat the case of several connections for the same http client. Ideally I need to wait on this mutex (if it is locked) in the same time as I wait on sockets. And for instance, I think implementing mutex/condition above epoll is clearly possible, but tricky to make sure all corner case work and we get the best possible efficiency. So it should be provided by the library. I did not find this (may be I missed it) on libev/libevent/poll for ocaml nor libevent for C.

anuragsoni commented 1 year ago
  • Poll.set poll fd Poll.Event.keep remove all event but keep in the epoll list (upd on epoll)

I don't really see what this adds over the current interface. For both one shot and edge trigger events once an event has surfaced via wait you won't see it again. In the case of one shot you won't see it till you register it again, in the case of edge trigger you won't see it again till the fd state changes. In all possible cases if you want the behavior of keep you simply avoid called none.

The user should just need a way to remove a socket from the hashtbl even after the socket was closed.

Closing a fd before de-registering it with epoll is a bad idea, so I'd rather only keep an interface that encourages at-least using EPOLL_CTL_DEL before closing a fd. https://idea.popcount.org/2017-03-20-epoll-is-fundamentally-broken-22/ has more details.

You often don't want oneshot nor epollet with non blocking io as you may even want to go to wait before reading/writing all (if your application is a "try to be fair" scheduler).

This should work with one shot without any issues. If you re-arm an even for a fd that initially surfaced via epoll_wait but without actually reading/writing from it, it should still show up in the next epoll wait.

craff commented 1 year ago
  • Poll.set poll fd Poll.Event.keep remove all event but keep in the epoll list (upd on epoll)

I don't really see what this adds over the current interface.

It replaces in many cases (like mine) one add + del by one upd. And I think upd is more efficient than both add and del sepatly.

The user should just need a way to remove a socket from the hashtbl even after the socket was closed.

Closing a fd before de-registering it with epoll is a bad idea

Sure, this is always a bug. But you want the user to easily diagnose the bug. This is why I want to report the exception generated by close to the user, If automatic removal is a good idea.

You often don't want oneshot nor epollet with non blocking io as you may even want to go to wait before reading/writing all (if your application is a "try to be fair" scheduler).

This should work with one shot without any issues. If you re-arm an even for a fd that initially surfaced via epoll_wait but without actually reading/writing from it, it should still show up in the next epoll wait.

I don't want to pay the price of rearm. I at least want to be able to time and experiment both solutions. In that case, you replace one add + del, with nothing.

There are also all the other flags of epoll (hup, rdhup, rdband, ...) which are sometimes needed.

Again, GADT would allow fine control on each platform of all options and you still did not say if you like the idea?

I am proposing to do a PR for this at least to support all epoll flags. And also I would look how to warant sync between epoll internal state and your state. But I won't do a PR if there is no chance you would accept it ;-)

craff commented 1 year ago

And by the way, as the user may want to attach information to sockets, it would be nice to propose user data in your internal hashtable.

Do you want me to create 2-3 issues with specific goals, we discuss on each issue... and then I do a PR ?

anuragsoni commented 1 year ago

There are also all the other flags of epoll (hup, rdhup, rdband, ...) which are sometimes needed.

Again, GADT would allow fine control on each platform of all options and you still did not say if you like the idea?

I am proposing to do a PR for this at least to support all epoll flags. And also I would look how to warant sync between epoll internal state and your state. But I won't do a PR if there is no chance you would accept it ;-)

I'll admit that feature parity with the entire set of options available within epoll, kqueue or windows/IOCP/wepoll was not my intention. The current state of the library is a minimal set of features that are available on all platforms that can feasibly provide an I/o event notification view on the various platforms, and nothing more than that. Most of this started as experiements in using effects for writing direct style single threaded event loops (https://github.com/anuragsoni/sandbox/tree/main/ocaml/effect_poll), or use it as one small piece within a multi-threaded task system (https://github.com/anuragsoni/sandbox/tree/main/ocaml/async_io) and I admit this is a very opinionated set of interface that won't cover very general use-cases. I'd wager this might be a roadblock with most abstractions over various platform libraries as for better or worse most of these platforms have very different feature-sets when it comes to the I/o event notifications and its hard to come up with a consistent api that works well everywhere but still allows access to the platform exclusive interfaces/flags.

I'd suggest that if you know for sure that linux is the only platform you'd like to support then go directly to epoll and use all the features a specific epoll interface (like polly or other direct c bindings to epoll) will provide as your application will be able to benefit from it instead of having to be constrained by a generic interface like this library. If you'd like to explore targeting more than linux (say macOS or BSD) but with a different api that exposes more of the internals, I also have kqueue bindings available (https://github.com/anuragsoni/kqueue-ml) that are very low level and expose the underlying kqueue sys calls as-is, which you can then wrap yourself in a layer that fits your application best.

anuragsoni commented 1 year ago

I'll also mention that if you are writing multi-threaded servers, you might also benefit from the abstractions provided by https://github.com/ocaml-multicore/eio since it also comes with a lot of other concurrency primitives one might need in such applications. They also have a more modern linux I/o stack via io-uring that can be of interest if you know you are exclusively going to target linux.

craff commented 1 year ago

I am aware of eio and decided not to use for two reasons:

Still I would like to support linux and bsd and why not windows.

I am looking for simple_httpd to find the best compromise in

So I found you code quite nice, I at minimum need the epollet/onshot combination, but probably a few other thing. And I add rather have access to all feature of epoll and kqueue to experiment.

I currently have

So I would be happy to collaborate with you on poll, but I need to fix at least the missing epollet. I also think that if you maintain a socket hashtbl it is better to open it to the user for its own data and keep the socket in it until they are closed (which is an easy invariant to maintain ?).

anuragsoni commented 1 year ago

I'd probably recommend taking the codebase in poll as a starting point then and adapting it your needs by removing things you don't need and modifying it specifically for what's needed for your use-case. I mostly say this as I'm not sure about how much time I'll be able to allocate to evolve the library beyond my intended goal for it which is the current state where it fills the role I need it for as a small piece in a different exploration where the current api suffices. I only intend for this library to be used in more generic I/o libraries where fine-grained control over the exact flags used for epoll vs kqueue (vs something) else isn't as important.

I no longer use OCaml at my day-job, and at the moment most of the time I do get to spend on OCaml is spent on other projects, and almost exclusively with janestreet libraries since they help me make progress faster by being able to leverage their well tested high quality libraries. I mostly say this to level-set expectations about my own time commitment, and I think you'll be able to make progress in a better manner and definitely a lot faster without being bottle-necked on poll. The library is fairly small and should be easy-ish to vendor/port to live within simple_httpd. Most of the complication lies in the build setup that uses janestreet's ppx_optcomp for conditional compilation, but there could be simpler ways of achieving that if you are writing a more targeted wrapper around kqueue/epoll.

craff commented 1 year ago

Thanks for your comment ... I am still thinking, but I might fork your library then. If I can keep it separated from simple_httpd I will be happy.

I think I will got for an oneshot/epollet strategy, maintening a queue of socket with available data. This way, I get O(1) scheduler I (this is the only way, and bench from libevent show that this is possible).

This will give

But before doing that I have to think about the implementation of mutex on top of file descriptor, (writing one byte when I unlock, reading that byte when I lock) and compare if the benefit of being able to wait on a mutex with select/poll is not counterbalanced by using socket instead of atomic memory operation.

Cheers, Christophe

PS: feel free to close this issue as you please.

anuragsoni commented 1 year ago

Thanks for your comment ... I am still thinking, but I might fork your library then. If I can keep it separated from simple_httpd I will be happy.

That sounds like a solid plan! I'm going to close this issue for now since I don't anticipate making changes directly within poll. Goodluck with your explorations!