cherrypy / cheroot

Cheroot is the high-performance, pure-Python HTTP server used by CherryPy. Docs -->
https://cheroot.cherrypy.dev
BSD 3-Clause "New" or "Revised" License
185 stars 90 forks source link

test_keepalive_conn_management failure on Solaris. #561

Closed kulikjak closed 1 year ago

kulikjak commented 1 year ago

ā“ I'm submitting a ...

šŸž Describe the bug. What is the current behavior?

cheroot/test/test_conn.py::test_keepalive_conn_management test fails on Oracle Solaris 11.4 when executed.

šŸ’” To Reproduce

Clone this repo and run: /usr/bin/python3.7 -m pytest cheroot/test/test_conn.py in the root directory. The same thing happens when running tox and in multiple Python versions (I tested 3.7 and 3.9).

šŸ’” Expected behavior The test suite is clean.

šŸ“‹ Details

=================================== FAILURES ===================================
________________________ test_keepalive_conn_management ________________________

test_client = <cheroot.testing._TestClient object at 0x7fc941f9d850>

    def test_keepalive_conn_management(test_client):
        """Test management of Keep-Alive connections."""
        test_client.server_instance.timeout = 2

        def connection():
            # Initialize a persistent HTTP connection
            http_connection = test_client.get_connection()
            http_connection.auto_open = False
            http_connection.connect()
            return http_connection

        def request(conn, keepalive=True):
            status_line, actual_headers, actual_resp_body = test_client.get(
                '/page3', headers=[('Connection', 'Keep-Alive')],
                http_conn=conn, protocol='HTTP/1.0',
            )
            actual_status = int(status_line[:3])
            assert actual_status == 200
            assert status_line[4:] == 'OK'
            assert actual_resp_body == pov.encode()
            if keepalive:
                assert header_has_value('Connection', 'Keep-Alive', actual_headers)
                assert header_has_value(
                    'Keep-Alive',
                    'timeout={test_client.server_instance.timeout}'.
                    format(**locals()),
                    actual_headers,
                )
            else:
                assert not header_exists('Connection', actual_headers)
                assert not header_exists('Keep-Alive', actual_headers)

        def check_server_idle_conn_count(count, timeout=1.0):
            deadline = time.time() + timeout
            while True:
                n = test_client.server_instance._connections._num_connections
                if n == count:
                    return
                assert time.time() <= deadline, (
                    'idle conn count mismatch, wanted {count}, got {n}'.
                    format(**locals()),
                )

        disconnect_errors = (
            http_client.BadStatusLine,
            http_client.CannotSendRequest,
            http_client.NotConnected,
        )

        # Make a new connection.
        c1 = connection()
        request(c1)
        check_server_idle_conn_count(1)

        # Make a second one.
        c2 = connection()
        request(c2)
        check_server_idle_conn_count(2)

        # Reusing the first connection should still work.
>       request(c1)

c1         = <http.client.HTTPConnection object at 0x7fc941fa9050>
c2         = <http.client.HTTPConnection object at 0x7fc941fa9bd0>
check_server_idle_conn_count = <function test_keepalive_conn_management.<locals>.check_server_idle_conn_count at 0x7fc941fa8440>
connection = <function test_keepalive_conn_management.<locals>.connection at 0x7fc941fa8320>
disconnect_errors = (<class 'http.client.BadStatusLine'>, <class 'http.client.CannotSendRequest'>, <class 'http.client.NotConnected'>)
request    = <function test_keepalive_conn_management.<locals>.request at 0x7fc941fa83b0>
test_client = <cheroot.testing._TestClient object at 0x7fc941f9d850>

cheroot/test/test_conn.py:529: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
cheroot/test/test_conn.py:483: in request
    http_conn=conn, protocol='HTTP/1.0',
        conn       = <http.client.HTTPConnection object at 0x7fc941fa9050>
        keepalive  = True
        test_client = <cheroot.testing._TestClient object at 0x7fc941f9d850>
cheroot/testing.py:113: in _wrapper
    return self.request(uri, method=http_method, **kwargs)
        attr_name  = 'get'
        http_method = 'GET'
        kwargs     = {'headers': [('Connection', 'Keep-Alive'), ('Host', ':::56906')], 'http_conn': <http.client.HTTPConnection object at 0x7fc941fa9050>, 'protocol': 'HTTP/1.0'}
        self       = <cheroot.testing._TestClient object at 0x7fc941f9d850>
        uri        = '/page3'
cheroot/testing.py:107: in request
    protocol=protocol,
        headers    = [('Connection', 'Keep-Alive'), ('Host', ':::56906')]
        http_conn  = <http.client.HTTPConnection object at 0x7fc941fa9050>
        method     = 'GET'
        protocol   = 'HTTP/1.0'
        self       = <cheroot.testing._TestClient object at 0x7fc941f9d850>
        uri        = '/page3'
cheroot/test/webtest.py:532: in openURL
    trap=socket.error,
        args       = ('/page3',)
        kwargs     = {'headers': [('Connection', 'Keep-Alive'), ('Host', ':::56906')], 'host': '::', 'http_conn': <http.client.HTTPConnection object at 0x7fc941fa9050>, 'method': 'GET', ...}
        on_exception = <function openURL.<locals>.on_exception at 0x7fc941fdeb90>
        opener     = functools.partial(<function _open_url_once at 0x7fc942318320>, '/page3', method='GET', headers=[('Connection', 'Keep-A...:56906')], host='::', port=56906, http_conn=<http.client.HTTPConnection object at 0x7fc941fa9050>, protocol='HTTP/1.0')
        raise_subcls = ()
cheroot/test/webtest.py:53: in retry_call
    return func()
        attempt    = 1
        attempts   = range(0, 9)
        cleanup    = <function openURL.<locals>.on_exception at 0x7fc941fdeb90>
        func       = functools.partial(<function _open_url_once at 0x7fc942318320>, '/page3', method='GET', headers=[('Connection', 'Keep-A...:56906')], host='::', port=56906, http_conn=<http.client.HTTPConnection object at 0x7fc941fa9050>, protocol='HTTP/1.0')
        retries    = 9
        trap       = <class 'OSError'>
cheroot/test/webtest.py:562: in _open_url_once
    conn.endheaders()
        body       = None
        conn       = <http.client.HTTPConnection object at 0x7fc941fa9050>
        headers    = [('Connection', 'Keep-Alive'), ('Host', ':::56906')]
        host       = '::'
        http_conn  = <http.client.HTTPConnection object at 0x7fc941fa9050>
        key        = 'Host'
        method     = 'GET'
        port       = 56906
        protocol   = 'HTTP/1.0'
        ssl_context = None
        url        = '/page3'
        value      = ':::56906'
/usr/lib/python3.7/http/client.py:1276: in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
        encode_chunked = False
        message_body = None
        self       = <http.client.HTTPConnection object at 0x7fc941fa9050>
/usr/lib/python3.7/http/client.py:1036: in _send_output
    self.send(msg)
        encode_chunked = False
        message_body = None
        msg        = b'GET /page3 HTTP/1.0\r\nConnection: Keep-Alive\r\nHost: :::56906\r\n\r\n'
        self       = <http.client.HTTPConnection object at 0x7fc941fa9050>
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <http.client.HTTPConnection object at 0x7fc941fa9050>
data = b'GET /page3 HTTP/1.0\r\nConnection: Keep-Alive\r\nHost: :::56906\r\n\r\n'

    def send(self, data):
        """Send `data' to the server.
        ``data`` can be a string object, a bytes object, an array object, a
        file-like object that supports a .read() method, or an iterable object.
        """

        if self.sock is None:
            if self.auto_open:
                self.connect()
            else:
>               raise NotConnected()
E               http.client.NotConnected

data       = b'GET /page3 HTTP/1.0\r\nConnection: Keep-Alive\r\nHost: :::56906\r\n\r\n'
self       = <http.client.HTTPConnection object at 0x7fc941fa9050>

/usr/lib/python3.7/http/client.py:978: NotConnected

From what I can tell, the c1 connection breaks right after c2 = connection() is called - specifically after thehttp_connection.connect() call.

šŸ“‹ Environment

šŸ“‹ Additional context

While not directly related, Cherrypy test suite is clean (tested with 18.8.0).

Our previous/current cheroot version (with clean test suite) is 6.3.2 and hence before https://github.com/cherrypy/cheroot/pull/199.

I am happy to dig deeper into this but at this point I am not sure what might be the issue or where to look or what to try

kulikjak commented 1 year ago

I did more digging and managed to extract this issue into a much smaller example:

# server.py
from cheroot import wsgi

def my_app(environ, start_response):
    print(".")
    status = '200 OK'
    response_headers = [('Content-type','text/plain'), ('Content-length','12')]
    start_response(status, response_headers)
    return [b'Hello world!']

addr = ('0.0.0.0', 8088)
server = wsgi.Server(addr, my_app)
server.start()
# client.py
import http.client

def get_connection():
    http.client.HTTPConnection._http_vsn_str = 'HTTP/1.0'
    http_connection = http.client.HTTPConnection("localhost:8088")
    http_connection.auto_open = False
    http_connection.connect()
    return http_connection

def request(conn, keepalive=True):
    conn.request("GET", "/page3", headers={'Connection': 'Keep-Alive'})
    r1 = conn.getresponse()
    assert r1.status == 200
    assert r1.reason == 'OK'
    print(r1.read())
    print("-------")

conn1 = get_connection()
request(conn1)
request(conn1)

conn2 = get_connection()
request(conn2)
request(conn1)

Running this results in http.client.RemoteDisconnected: Remote end closed connection without response on Solaris (fails in the last request call), but works as expected on Linux. I am still unsure what the cause is; ATM it seems like a platform difference...

kulikjak commented 1 year ago

I found an interesting thing - when I add a short delay (time.sleep(1)) between the last request(conn1) and conn2 = get_connection(), it works every time. It might still fail later if I alternate between request(conn1) and request(conn2) quickly, but again, if I add delays in between those, it works. It seems that there might be some kind of

Similarly, when I add a delay into the test_keepalive_conn_management, it fails much later (not sure why, but that is not important ATM).

kulikjak commented 1 year ago

So, the mystery is solved, and Cheroot is innocent. I've got all the way to _ThreadsafeSelector and found out that the DefaultSelector used on Solaris (DevpollSelector) seems to be the issue; when I change to either SelectSelector or PollSelector, all tests and the above example pass.

I am yet to find what exactly is wrong with the DevpollSelector, but that is not a cheroot issue. Sorry for the "false alarm".