Tribler / tribler

Privacy enhanced BitTorrent client with P2P content discovery
https://www.tribler.org
GNU General Public License v3.0
4.74k stars 445 forks source link

ProactorEventLoop bug reproduced: UDP server stops accepting datagrams from any clients after a single client disconnects #7972

Closed kozlovsky closed 2 months ago

kozlovsky commented 2 months ago

It is possible to have the following situation of Windows with ProactorEventLoop easily:

  1. An UDP server is running and awaits datagrams from clients.
  2. An UDP client sends a datagram and drops before the server can respond.
  3. The UDP server receives the datagram and responds to the disconnected client. More specifically, the client machine is still reachable, but the client application no longer listens on the port.
  4. After that, the UDP server cannot receive datagrams from ANY clients. The server stops executing callbacks (like connection_made, datagram_received, error_received, and connection_lost) until restarted.

The following is an example that reproduces the bug:

server.py

import asyncio

class EchoServerProtocol(asyncio.DatagramProtocol):
    def connection_made(self, transport):
        self.transport = transport

    def datagram_received(self, data, addr):
        message = data.decode()
        print('Received %r from %s' % (message, addr))

        # print('Is server transport closing due to client abort?', self.transport.is_closing())

        print('Sending %r to %s' % (message, addr))
        self.transport.sendto(data, addr)

    def error_received(self, exc):
        print(f"Exception thrown caught by 'error_received': {exc}")

async def main():
    print("Starting UDP server")
    loop = asyncio.get_running_loop()

    # loop.set_debug(True)  # This does not affect the occurrence of this issue

    transport, protocol = await loop.create_datagram_endpoint(
        lambda: EchoServerProtocol(), local_addr=('127.0.0.1', 9999))

    try:
        await asyncio.sleep(3600)  # Serve for 1 hour.
    finally:
        transport.close()

asyncio.run(main())

client.py

import asyncio

class EchoClientProtocol:
    def __init__(self, message, on_con_lost):
        self.message = message
        self.on_con_lost = on_con_lost
        self.transport = None

    def connection_made(self, transport: asyncio.DatagramTransport):
        self.transport = transport

        print('Sending:', self.message)
        self.transport.sendto(self.message.encode())

        # Try to force the server to raise the error 1234 "No service is operating
        # at the destination network endpoint on the remote system."
        self.transport.abort()

    def datagram_received(self, data, addr):
        print("Received:", data)

        print("Close the socket")
        self.transport.close()

    def error_received(self, exc):
        print('Error received:', exc)

    def connection_lost(self, exc):
        print("Connection closed")
        self.on_con_lost.set_result(True)

async def main():
    loop = asyncio.get_running_loop()
    # loop.set_debug(True)  # error disappearing if uncommented

    on_con_lost = loop.create_future()
    message = "Hello World!"

    transport, protocol = await loop.create_datagram_endpoint(
        lambda: EchoClientProtocol(message, on_con_lost),
        remote_addr=('127.0.0.1', 9999))

    try:
        await on_con_lost
    finally:
        transport.close()

asyncio.run(main())

You can run "server.py" and check that after receiving the message from the first client, all other clients are ignored, and the server hangs.

See CPython issues #88906, #91227

The issue was fixed three weeks ago in the main Python branch and will be included in the subsequent releases for Python 3.11 and 3.12.

I backported the fix to Python 3.8 and above; I am preparing the PR now.

kozlovsky commented 2 months ago

The reason for the bug: