MagicStack / uvloop

Ultra fast asyncio event loop.
Apache License 2.0
10.42k stars 544 forks source link

UDPTransport.sendto spends a lot of time validating the address #214

Closed jlaine closed 5 years ago

jlaine commented 5 years ago

I have been trying to optimize the performance of aiortc, an asyncio-based WebRTC implementation. aiortc uses aioice (an Interactive Connectivity Establishment library) for its network access. Using uvloop provides some performance improvement, but not as much as I had hoped.

During my optimization work I noticed that when called with an address, UDPTransport.sendto spends a noticeable amount of time calling into Python code, looking up the socket's family and type properties. Looking at the code, this corresponds to validation checks on the destination address which are run on every single packet (which involves a call to getaddrinfo whose result is ultimately discarded):

https://github.com/MagicStack/uvloop/blob/df0e543a653cd17b800a866b7efa5326352efa16/uvloop/handles/udp.pyx#L146

I am aware that specifying a remote address when calling create_datagram_endpoint would probably work around this issue, but in the case of ICE this is not practical as the remote party may have multiple addresses, which are not known in advance and may change over time.

To assess the impact of the validation code I tried a basic patch which stores the last validated address and skips the validation if sendto is called for the same address again. I then ran the following test:

import asyncio
import socket
import time

import uvloop

class DummyProtocol(asyncio.DatagramProtocol):
    pass

async def run(loop):
    addr = ('127.0.0.1', 1234)
    data = b'M' * 1500
    transport, protocol = await loop.create_datagram_endpoint(
        lambda: DummyProtocol(),
        family=socket.AF_INET)

    start = time.process_time()
    for i in range(1000000):
        transport.sendto(data, addr)
    print('elapsed', time.process_time() - start)

asyncio.set_event_loop_policy(uvloop.EventLoopPolicy())
loop = asyncio.get_event_loop()
loop.run_until_complete(run(loop))

On my laptop the results of sending out a million packets:

Questions:

jlaine commented 5 years ago

The trivial patch I used:

diff --git a/uvloop/handles/udp.pxd b/uvloop/handles/udp.pxd
index 4f03650..25a7e61 100644
--- a/uvloop/handles/udp.pxd
+++ b/uvloop/handles/udp.pxd
@@ -3,6 +3,7 @@ cdef class UDPTransport(UVBaseTransport):
         object sock
         UVPoll poll
         object address
+        object last_address
         object buffer

     cdef _init(self, Loop loop, object sock, object r_addr)
diff --git a/uvloop/handles/udp.pyx b/uvloop/handles/udp.pyx
index 81ad198..6ecc489 100644
--- a/uvloop/handles/udp.pyx
+++ b/uvloop/handles/udp.pyx
@@ -6,6 +6,7 @@ cdef class UDPTransport(UVBaseTransport):
         self.sock = None
         self.poll = None
         self.buffer = col_deque()
+        self.last_address = None
         self._has_handle = 0

     cdef _init(self, Loop loop, object sock, object r_addr):
@@ -143,7 +144,7 @@ cdef class UDPTransport(UVBaseTransport):
             raise ValueError(
                 'Invalid address: must be None or {}'.format(self.address))

-        if addr is not None and self.sock.family != socket.AF_UNIX:
+        if addr not in (None, self.last_address) and self.sock.family != socket.AF_UNIX:
             addrinfo = __static_getaddrinfo_pyaddr(
                 addr[0], addr[1],
                 uv.AF_UNSPEC, self.sock.type, self.sock.proto, 0)
@@ -155,6 +156,7 @@ cdef class UDPTransport(UVBaseTransport):
                 raise ValueError(
                     'UDP.sendto(): {!r} socket family mismatch'.format(
                         addr))
+            self.last_address = addr

         if self._conn_lost and self._address:
             if self._conn_lost >= LOG_THRESHOLD_FOR_CONNLOST_WRITES:
1st1 commented 5 years ago

On my laptop the results of sending out a million packets:

  • without the patch: 7.8s
  • with the patch: 3.0s

Great improvement!

Questions:

  • do we need the validation code at all?

Ideally yes. DNS lookup would be blocking and that would be quite hard to debug.

if we do, would you consider a patch which caches the last successfully validated address and skips the validation if sendto is called again with the same address?

Yes, please submit a PR. You can probably add a function with an @lru_cache decorator to keep track of last N checked addresses.

jlaine commented 5 years ago

I also tried something else: simply storing the socket's family, proto and type as int members of UDPTransport. This doesn't skip the call to getaddrinfo but gives a significant performance gain which holds no matter whether it's an address we have already seen or not. The same test runs in 4.3s with this patch.

Regarding lru_cache wouldn't that mean dropping out of C and back into Python?

1st1 commented 5 years ago

I also tried something else: simply storing the socket's family, proto and type as int members of UDPTransport. This doesn't skip the call to getaddrinfo but gives a significant performance gain which holds no matter whether it's an address we have already seen or not. The same test runs in 4.3s with this patch.

Caching family/proto/type is OK too.

Regarding lru_cache wouldn't that mean dropping out of C and back into Python?

lru_cache is implemented in C since 3.6 (IIRC), but yeah, it involves quite a few CPython C API calls. You'd probably want to profile it to check if it's fast enough.

jlaine commented 5 years ago

It looks like lru_cache performance is really not an issue. With the patch I submitted in #215 my test now runs in 3.1s, so almost identical to the "single address cache" version!

1st1 commented 5 years ago

Using uvloop provides some performance improvement, but not as much as I had hoped.

Speaking of which -- when libuv 2.0 is released we'll likely be able to switch to native UDP libuv implementation which should be significantly (up to 10x) faster than what we have now.

jlaine commented 5 years ago

Speaking of which -- when libuv 2.0 is released we'll likely be able to switch to native UDP libuv implementation which should be significantly (up to 10x) faster than what we have now.

I'm very eager to see this in action, and would be happy to help if I can! It doesn't look as though UDP is getting as much love in the asyncio world as TCP is, but that might change with the likes of HTTP/3.

1st1 commented 5 years ago

I'm very eager to see this in action, and would be happy to help if I can!

For the UDP part specifically we're blocked until libuv 2.0 lands. Meanwhile feel free to help with other parts of the project! For instance, I still can't find time to implement loop.sendfile() and start_serving—new asyncio 3.7 features.