kazu-yamamoto commented 11 years ago

I think that I understand why my kqueue implementation does not work well at this moment.

In the current architecture, I.poll is called by both timeout manager and IO managers. This is OK for epoll. But it is NG for kqueue because kevent() plays a roll of registration and polling. In other words, kevent() does what epoll_ctl() and epoll_wait() do at the same time.

So, on kqueue platform, if timeout manager calls I.poll, events asked by threadWait are consumed. I can enumerate epoll_ctl() and epoll_wait() with kevent(): one registers events on the fly and the other just polls. But I don't know this can works for Poll.

Another option is to prepare special I.poll for timeout manager.

Which do you prefer?

kazu-yamamoto commented 11 years ago

This observation may be wrong since timeout/IO managers uses their local data.

kazu-yamamoto commented 11 years ago

Oh! SM.registerFd does not wake up its IO manager. That's why.

The old IO manager registers pending events when waken up on kqueue platform. The new IO manager tries to behave the same but is never waken up.

Please consider this and let me know your favorite solution.

kazu-yamamoto commented 11 years ago

So, the current parallel IO does not need two pipes (four descriptors) at all. I think this is the way to go. For Poll, we should keep one pipe to be waken up.

What do you think?

kazu-yamamoto commented 11 years ago

I confirmed that my Kqueue code works if "wakeManager" is inserted into SM.registerFd.

kazu-yamamoto commented 11 years ago

Here is a snapshot of "top -H" on FreeBSD. mighty +RTS -N4 and weighttp -t 2:

  PID USERNAME PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
95464 kazu      43    0   170M 85088K CPU4    4   0:08 29.39% mighty{mighty}
95464 kazu      41    0   170M 85088K uwait  10   0:08 28.56% mighty{mighty}
95464 kazu      42    0   170M 85088K uwait   2   0:08 28.56% mighty{mighty}
95464 kazu      41    0   170M 85088K kqread  5   0:08 28.56% mighty{mighty}
95464 kazu      42    0   170M 85088K CPU10  10   0:08 27.98% mighty{mighty}
95464 kazu      41    0   170M 85088K kqread  8   0:08 27.59% mighty{mighty}
95464 kazu      41    0   170M 85088K uwait   1   0:08 27.59% mighty{mighty}
95464 kazu      36    0   170M 85088K uwait   8   0:03 17.58% mighty{mighty}
95479 kazu      26    0 57616K  7064K kqread 11   0:03  9.96% weighttp{weighttp
95464 kazu      22    0   170M 85088K kqread  2   0:05  9.77% mighty{mighty}
95479 kazu      26    0 57616K  7064K kqread  6   0:03  9.67% weighttp{weigh

AndreasVoellmy commented 11 years ago

As you noticed, each SequentialManager has its own poll backend instance, so there should be no interference between them (and also between the timer manager and any sequential managers).

Good catch with the missing wake up! I assumed that the kqueue backend was designed in the same way that the epoll backend is. With the epoll backend, we don't need to wake up the manager when a new event is registered or unregistered. With the epoll backend, modifyFd immediately calls epoll_ctl to register the fd. A call to epoll_wait may be in progress (i.e. blocked in kernel) and this will be woken up by the kernel if the newly registered fd has activity. The kqueue backend appears to use the same design as poll backend; that is, a call to modifyFd does not actually register the event with kqueue, it just records the change to the event list and will register the new events at the next kqueue poll call.

There are two possible solutions: (1) keep the current kqueue design and wake up the manager after registerFd etc. (2) change the design to match the epoll backend design. I think (2) will result in a more efficient backend, but it will take a bit more work. Let me know which approach you would rather follow.

AndreasVoellmy commented 11 years ago

What is the significance of the snapshot of "top -H" that you posted? What should I notice there?

AndreasVoellmy commented 11 years ago

I read through the kqueue paper (http://people.freebsd.org/~jlemon/papers/kqueue.pdf) and the man pages. I'll present how I see the tradeoff between the wake vs no-wake designs:

Using kevent & wake design: Each register will perform a system call to write to wake fd. What happens next depends on the state of the IO polling loop.

If the IO polling loop is not blocked (i.e. it has yielded and is still on the capability's run queue), then when it gets to run again it will collect all the registrations that have occurred and call kevent registering new kevents and checking all existing kevents for activity. This is the best case.

If the IO polling loop is in a blocking kevent call (it has released the capability and another OS thread is running the capability), then the write to the wake fd will cause the kernel to schedule the blocked kevent polling thread, context switching to it; then this thread waits to grab the Haskell capability, the Haskell scheduler does its work (balance work, etc), then executes the Haskell code for the IO poll loop, which very often just results in another call to kevent. When this kevent executes, other Haskell thread may have registered more fds, so all of these changes will be added to the kqueue in this single kevent operation. It is not clear how much this optimization is worth and whether it is frequently the case that more than one fd has been registered by the time the polling Haskell thread calls kevent. It seems pretty clear that in this case, a lot more work is performed.

Using kevent & no-wake design: Each register performs a kevent system call to register the kevent. The subsequent behavior does not depend on the state of the IO loop (in contrast to the situation above). If no activity on new file is ready, nothing more will happen. The downside is that kevent is called once per registering fd. The only way this method could be worse is if the cost of the kevent system call with a single event is higher than the cost of the same number of file writes to the wake fd and the single call to kevent with multiple kevents (and not even considering the cost to repeatedly wake up the blocked IO loop thread).

AndreasVoellmy commented 11 years ago

Event Registration Options for kqueue

If a kevent is registered with EV_ONESHOT, then when it is returned it deletes the corresponding knote from the system. Subsequent registrations of the same kevent will require creating a new knote (and linking it on to the various kqueue data structures). According to the kqueue paper, creating new knotes is an expensive operation (relative to other kqueue operations).

Note that the old IO manager design (i.e. the one in GHC HEAD) calls unregisterFd_ after the Haskell callback is fired, which causes the kevent to be (eventually) removed and deleted. So the old IO manager certainly is not better than using EV_ONESHOT.

However, we may be able to do even better than EV_ONESHOT. For example, on my FreeBSD 9.0 in the man page for kqueue I see that there is an EV_DISPATCH flag. This flag disables the event source after the event is delivered via a kevent call, but it does not delete the knote. Then subsequent registrations of the same kevent will not need to create new knotes. This seems like it will be much faster than EV_ONESHOT.

Unfortunately, this flag does not seem to be present universally on BSD systems. For example, on my mac OS X machine, the EV_DISPATCH flag is not mentioned in the man page. It does mention EV_DISABLE, which we can use to accomplish the same thing. To use EV_DISABLE, we would call kevent() with EV_DISABLE when processing the callback (and while holding the lock on the callback table in Haskell). So we would incur an extra system call and we would have to make that call while holding the callback table lock, so it might not outperform the use of EV_ONESHOT. It is worth experimenting with these two approaches.

AndreasVoellmy commented 11 years ago

One more point regarding the wake vs. no-wake designs: the no-wake design has the additional advantage that it requires less state. In the wake design, it has to keep an MVar holding the pending changes since the last kevent() call. The no-wake design doesn't have to repeatedly mutate this variable. I think this alone probably makes the no-wake design a big win, because I believe any write to an MVar causes a system call to grab the lock on the mutex.

AndreasVoellmy commented 11 years ago

It appears that the mac os x also supports EV_DISPATCH. When I look into /usr/include/sys/event.h I see these definitions, which include EV_DISPATCH:

/* actions _/

define EVADD 0x0001 / add event to kq (implies enable) _/

define EVDELETE 0x0002 / delete event from kq _/

define EVENABLE 0x0004 / enable event _/

define EVDISABLE 0x0008 / disable event (not reported) _/

define EVRECEIPT 0x0040 / force EV_ERROR on success, data == 0 */

/* flags _/

define EVONESHOT 0x0010 / only report one occurrence _/

define EVCLEAR 0x0020 / clear event state after reporting _/

define EVDISPATCH 0x0080 / disable event after reporting */

define EVSYSFLAGS 0xF000 /* reserved by system /

define EVFLAG0 0x1000 / filter-specific flag _/

define EVFLAG1 0x2000 / filter-specific flag */

/* returned values _/

define EVEOF 0x8000 / EOF detected _/

define EVERROR 0x4000 / error, data contains errno */

AndreasVoellmy commented 11 years ago

OK, you can ignore my earlier comment that EV_DISPATCH is not supported on mac os x. It seems to be there (I just tested it with a small program on OS X Lion) even though it isn't mentioned in the man page.

kazu-yamamoto commented 11 years ago

About "top -H": I just want to tell my kqueue implementation is working and using multiple cores. "-H" is the option to show OS threads as well as processes.

kazu-yamamoto commented 11 years ago

As I mentioned in the other issue, I would go with the no-wake design kqueue.

kazu-yamamoto commented 11 years ago

EV_DISPATCH is a good idea. But how to delete knote when it becomes unnecessary (e.g. connection is closed)?

AndreasVoellmy commented 11 years ago

knotes will always be deleted when the file descriptor is closed. In addition, GHC.Event.Thread.closeFdWith calls unregisterFd, which in turn calls the backend in to delete the event. And closeFdWith is called by functions like Network.Socket.close.

Do you think that is good enough?

On Thu, Nov 29, 2012 at 9:26 PM, Kazu Yamamoto notifications@github.comwrote:

EV_DISPATCH is a good idea. But how to delete knote when it becomes unnecessary (e.g. connection is closed)?

— Reply to this email directly or view it on GitHubhttps://github.com/AndreasVoellmy/ghc-base-arv/issues/3#issuecomment-10875855.

kazu-yamamoto commented 11 years ago

Yes!

I'm now implementing non-wake kqueue.

kazu-yamamoto commented 11 years ago

I quickly implemented no wakeup kqueue:

https://github.com/kazu-yamamoto/ghc-base-arv/commit/3d2eca2279da470c6fe624e2bf9285d6c8e9b08d

But this code does not work at this moment. I will debug it in the next week.

kazu-yamamoto commented 11 years ago

Ah. I made a mistake on testing. This code works well. But it is slower than wakeup design.

wakeup: around 20,000 req/s no-wakeup: around 8,000 req/s

I will try to understand why this happens in the next week.

kazu-yamamoto commented 11 years ago

Ah. Specifying "-qa" for no-wakeup kqueue results in 20,000 req/s. :-)

AndreasVoellmy / ghc-base-arv

I.poll cannot be shared by timeout manager and IO managers on kqueue platform #3

define EVADD 0x0001 / add event to kq (implies enable) _/

define EVDELETE 0x0002 / delete event from kq _/

define EVENABLE 0x0004 / enable event _/

define EVDISABLE 0x0008 / disable event (not reported) _/

define EVRECEIPT 0x0040 / force EV_ERROR on success, data == 0 */

define EVONESHOT 0x0010 / only report one occurrence _/

define EVCLEAR 0x0020 / clear event state after reporting _/

define EVDISPATCH 0x0080 / disable event after reporting */

define EVSYSFLAGS 0xF000 /* reserved by system /

define EVFLAG0 0x1000 / filter-specific flag _/

define EVFLAG1 0x2000 / filter-specific flag */

define EVEOF 0x8000 / EOF detected _/

define EVERROR 0x4000 / error, data contains errno */