Closed jlaine closed 5 years ago
The trivial patch I used:
diff --git a/uvloop/handles/udp.pxd b/uvloop/handles/udp.pxd
index 4f03650..25a7e61 100644
--- a/uvloop/handles/udp.pxd
+++ b/uvloop/handles/udp.pxd
@@ -3,6 +3,7 @@ cdef class UDPTransport(UVBaseTransport):
object sock
UVPoll poll
object address
+ object last_address
object buffer
cdef _init(self, Loop loop, object sock, object r_addr)
diff --git a/uvloop/handles/udp.pyx b/uvloop/handles/udp.pyx
index 81ad198..6ecc489 100644
--- a/uvloop/handles/udp.pyx
+++ b/uvloop/handles/udp.pyx
@@ -6,6 +6,7 @@ cdef class UDPTransport(UVBaseTransport):
self.sock = None
self.poll = None
self.buffer = col_deque()
+ self.last_address = None
self._has_handle = 0
cdef _init(self, Loop loop, object sock, object r_addr):
@@ -143,7 +144,7 @@ cdef class UDPTransport(UVBaseTransport):
raise ValueError(
'Invalid address: must be None or {}'.format(self.address))
- if addr is not None and self.sock.family != socket.AF_UNIX:
+ if addr not in (None, self.last_address) and self.sock.family != socket.AF_UNIX:
addrinfo = __static_getaddrinfo_pyaddr(
addr[0], addr[1],
uv.AF_UNSPEC, self.sock.type, self.sock.proto, 0)
@@ -155,6 +156,7 @@ cdef class UDPTransport(UVBaseTransport):
raise ValueError(
'UDP.sendto(): {!r} socket family mismatch'.format(
addr))
+ self.last_address = addr
if self._conn_lost and self._address:
if self._conn_lost >= LOG_THRESHOLD_FOR_CONNLOST_WRITES:
On my laptop the results of sending out a million packets:
- without the patch: 7.8s
- with the patch: 3.0s
Great improvement!
Questions:
- do we need the validation code at all?
Ideally yes. DNS lookup would be blocking and that would be quite hard to debug.
if we do, would you consider a patch which caches the last successfully validated address and skips the validation if sendto is called again with the same address?
Yes, please submit a PR. You can probably add a function with an @lru_cache
decorator to keep track of last N checked addresses.
I also tried something else: simply storing the socket's family
, proto
and type
as int
members of UDPTransport
. This doesn't skip the call to getaddrinfo
but gives a significant performance gain which holds no matter whether it's an address we have already seen or not. The same test runs in 4.3s with this patch.
Regarding lru_cache wouldn't that mean dropping out of C and back into Python?
I also tried something else: simply storing the socket's family, proto and type as int members of UDPTransport. This doesn't skip the call to getaddrinfo but gives a significant performance gain which holds no matter whether it's an address we have already seen or not. The same test runs in 4.3s with this patch.
Caching family/proto/type is OK too.
Regarding lru_cache wouldn't that mean dropping out of C and back into Python?
lru_cache is implemented in C since 3.6 (IIRC), but yeah, it involves quite a few CPython C API calls. You'd probably want to profile it to check if it's fast enough.
It looks like lru_cache
performance is really not an issue. With the patch I submitted in #215 my test now runs in 3.1s, so almost identical to the "single address cache" version!
Using uvloop provides some performance improvement, but not as much as I had hoped.
Speaking of which -- when libuv 2.0 is released we'll likely be able to switch to native UDP libuv implementation which should be significantly (up to 10x) faster than what we have now.
Speaking of which -- when libuv 2.0 is released we'll likely be able to switch to native UDP libuv implementation which should be significantly (up to 10x) faster than what we have now.
I'm very eager to see this in action, and would be happy to help if I can! It doesn't look as though UDP is getting as much love in the asyncio world as TCP is, but that might change with the likes of HTTP/3.
I'm very eager to see this in action, and would be happy to help if I can!
For the UDP part specifically we're blocked until libuv 2.0 lands. Meanwhile feel free to help with other parts of the project! For instance, I still can't find time to implement loop.sendfile()
and start_serving
—new asyncio 3.7 features.
I have been trying to optimize the performance of
aiortc
, an asyncio-based WebRTC implementation.aiortc
usesaioice
(an Interactive Connectivity Establishment library) for its network access. Usinguvloop
provides some performance improvement, but not as much as I had hoped.During my optimization work I noticed that when called with an address, UDPTransport.sendto spends a noticeable amount of time calling into Python code, looking up the socket's
family
andtype
properties. Looking at the code, this corresponds to validation checks on the destination address which are run on every single packet (which involves a call to getaddrinfo whose result is ultimately discarded):https://github.com/MagicStack/uvloop/blob/df0e543a653cd17b800a866b7efa5326352efa16/uvloop/handles/udp.pyx#L146
I am aware that specifying a remote address when calling
create_datagram_endpoint
would probably work around this issue, but in the case of ICE this is not practical as the remote party may have multiple addresses, which are not known in advance and may change over time.To assess the impact of the validation code I tried a basic patch which stores the last validated address and skips the validation if
sendto
is called for the same address again. I then ran the following test:On my laptop the results of sending out a million packets:
Questions:
sendto
is called again with the same address?