Alogani / NimGo

Asynchronous Library Inspired by Go's goroutines, for Nim
MIT License
37 stars 2 forks source link

Redesign the design of the selector pool to make it work in windows. #39

Open Alogani opened 4 months ago

Alogani commented 4 months ago

This is the last major work before work on multithreading (#21) could begin.

The goal is to create a portable API to query asynchronous events. This will likely significantly redesign the public API of the nimgo/eventdispatcher library, and hopefully the API won't require many changes after this update.

Alogani commented 4 months ago

https://www.ulduzsoft.com/2014/01/practical-difference-between-epoll-and-windows-io-completion-ports-iocp/

Always so simple with windows...

Alogani commented 4 months ago

Hi @Araq,

I need your review. I'm working on providing a common API for windows async (IOCP) and unix async (std/selectors) to allow future transformation of NimGo into our M:N goal (without having to refactor every high level API later).

I came up with that kind of signature for private API of selector:

proc select*(selector: Selector, timeoutMs: int): tuple[fd: AsyncFd, customData: pointer]
proc registerReadFile*(selector: Selector, fd: AsyncFd, customData: pointer,
                       buffer: pointer, size: int, bytesRead: ptr int,
                       cancellation: out proc())

## select function doesn't use windows' select, but `getQueuedCompletionStatus`,
##    which return only one completed event at a time.

Which could be used like that :

proc readFile(fd: AsyncFd, buffer: var string, size: int, timeoutMs = -1) =
  ## This is only an example usage:
  buffer = newString(size)
  # Here we pass coroutine by simplicity, but we could pass a more complex object
  var cancellableCoroutine = toOneShot((getCurrentCoroutine())
  var customData = cast[pointer](cancellableCoroutine)
  var bytesRead: int
  var cancellationCb: proc()
  registerReadFile(ActiveSelector, fd, customData, addr(result[0]), size, addr(bytesRead), cancellationCb)
  if timeoutMs != -1:
    registerTimer(cancellationCb, cancellableCoroutine, timeoutMs) ## -> this could be executed in a different thread
  suspend() ## After suspend, we could be in another thread
  ## We could be either resumed due to timer cancellation or read operation was done
  buffer.setLen(bytesRead)

It might not be very useful, but here is my draft of the implementation of registerReadFile for windows (I don't show you the one for select, nothing fancy)

proc registerReadFile*(selector: Selector, fd: AsyncFd, customData: pointer,
                       buffer: pointer, size: int, bytesRead: ptr int,
                       cancellation: out proc()) =
  var customOverlappedPtr = allocShared()
  customOverlappedPtr[] = CustomOverlapped(customData: customData, bytesReadPtr: bytesRead)
  var overlappedPtr = cast[ptr OVERLAPPED](customOverlappedPtr) # this is what std/asyncdispatch do
  let success = bool(readFile(fd.Handle, buffer, size.int32, nil, overlappedPtr))
  if not success:
    let osError = osLastError()
    if osError.int32 == ERROR_HANDLE_EOF:
      # TODO: Add it inside the selector queue
      discard # For now
    else:
      dealloc(customOverlappedPtr)
      raiseOSError(osError)
  cancellation = proc() = cancel(selector, overlappedPtr)

Do you think this is the correct approach ?

Buldram commented 4 months ago

FYI ringabout has done something similar before, check out https://github.com/ringabout/wepoll https://github.com/ringabout/ioselectors and https://github.com/ringabout/httpx.

Alogani commented 4 months ago

Thanks, that's interesting, I have been looking for something similar but the following limitation is a nogo:

https://github.com/ringabout/wepoll/tree/master/src/wepoll

Limitations Only works with sockets.

It also don't seem to support https://learn.microsoft.com/en-us/windows/win32/fileio/cancelio, which I need.

Alogani commented 4 months ago

I think I found an appropriate design.

I'm currently writing the implementation for both IOCP and EPOLL, and it seems possible while maintaining performance.

type IOOperation

proc applyIO*(ioOperation: IOOperation, buffer: pointer): int
proc consume*(): pointer # return userdata, can be used for cancellation

proc select*(selector: Selector, timeoutMs: int): ptr IOOperation
proc readFileAsync*(selector: Selector, fd: AsyncFd, buffer: pointer, size: int, userData: pointer, ioOperation: var IOOperation)

And a cross platform usage

## Fiber 1
proc main() =
  var s = newSelector()
  var fd = AsyncFd(0)
  registerHandle(s, fd)
  var buffer = newString(1024)
  var ioOperation: IOOperation
  readFileAsync(s, fd, addr(buffer[0]), 10, cast[pointer](getCurrentCoroutine()), ioOperation)
  suspend()
  setLen(buffer, newOp.applyIO(addr(buffer[0]))
  echo buffer
resume(newCoroutine(main))

## Fiber 2
while true:
  var io = select(s, -1)
  resume(io.consume())

Of course, because it use raw memory, so harder to debug, but shall fit well into a multithreading design

Araq commented 4 months ago

Do you think this is the correct approach ?

Good question. Unfortunately I don't know. However, in general we should try and avoid .closure callbacks as the hidden state that is passed around is exactly the state that will cause problems for multi-threading.

Alogani commented 4 months ago

Thanks, that's already a beginning

Good question. Unfortunately I don't know

API design is difficult, especially when that involves wrapping low level API, so I am not surprised ! And I bet, you are not much of an async guy ;-)

However, the fact you don't dismiss completly the design is a good sign I am not going the wrong way entirely ! I am confident I will come up with something

avoid .closure callbacks as the hidden state

Agreed, I will avoid that, I realized it cause issues even in single threaded design, and I think I can make state explicit without sacrificing too much performance.

Concerning the advancement

I think it will likely take two weeks (sigh.) before I can write the whole implementation of Async API with most corner cases. So, although you could contribute on some of the multithreading design, you will likely be blocked until I finish this part (so it's up to you if you want to wait). I'll inform you then.

Alogani commented 3 months ago

I come to the news after quite a long time and I excuse myself. Some of my time was taken away by other projects and I am not sure when I could advance more.

In my opinion, there is a problem in how std/selectors works, because by design it uses the blocking call, which doesn't allow to leverage multithreading behaviour. Furthermore, using blocking calls induce an important number of syscalls which can be largely reduced in both linux's select, poll, epoll and kqueue by using non blocking read with EAGAIN. Furthermore, epoll and kqueue have special flags to allow efficient multithreading (like EPOLLET). Here is an interesting article discussing about it : https://eklitzke.org/blocking-io-nonblocking-io-and-epoll. This problem affects any library relying on std/selectors like std/asyncdispatch and guzba/mummy, which is likely to have a strong overhead. It also creates a design flaw in NimGo where registering a file descriptor for both read and write result in a busy loop (I am not sure how asyncdispatch and mummy avoid that, do they ?)

I think the most elegant and efficient way to allow multithreading is to have a dedicated threadpool for IO (syscalls), and dedicated threadpool for CPU tasks. std/asyncdispatch could also benefit from it without change in its public API.

I have begun to write a draft of how a multithreaded selector library could look like here : https://github.com/Alogani/nectar/tree/devel. However, my competence on multithreading are lacking and the presence of multiple performance critical-section don't help (avoiding lock contention).

To the continuation of NimGo project:

hamidb80 commented 3 months ago

very sorry to hear that, but community will appreciate your efforts