aio-libs / aiohttp

Asynchronous HTTP client/server framework for asyncio and Python
https://docs.aiohttp.org
Other
15.11k stars 2.02k forks source link

Websocket client hanging/blocking? #3291

Closed bradwood closed 1 month ago

bradwood commented 6 years ago

Long story short

ws.receive() appears to be hanging despite data flowing from websocket.

Expected behaviour

ws.receive should output and report something.

Actual behaviour

it just hangs/blocks on the websocket.

Steps to reproduce

The code below is the calling code.

    async def _ws_subscribe(self,
                            session: ClientSession,
                            url: str
                            ) -> Dict:
        """ Fetch data from web socket asynchronously.

        This helper method fetches data from a web socket asynchronously. It is used to
        fetch subscribe to websockets of the SkyQ box.

        Args:
            session (aiohttp.ClientSession): Session to use when fetching the data.
            url (str): WebSocket URL to fetch.

        Returns:
            dict: The body of data returned.

        """
        LOGGER.debug(f"Inside _ws_subscribe()...")

        async with session.ws_connect(url, autoclose=False) as ws:
            while True:
                LOGGER.debug(f"Inside _ws_subscribe() infinite loop pre ws.receive.")
                payload = await ws.receive()
                LOGGER.debug(f"READ SOMETHING! _ws_subscribe() infinite loop post ws.receive")
                LOGGER.debug(f'type = {payload.type}')
                LOGGER.debug(f'payload data = {payload.data}')
                LOGGER.debug(f'payload exception = {ws.exception()}')
                if payload.type == aiohttp.WSMsgType.TEXT:
                    LOGGER.debug('Web-socket data received.')
                    asyncio.create_task(self._handle(payload))
                    # asyncio.ensure_future(self._handle(payload))
                elif payload.type == aiohttp.WSMsgType.BINARY:
                    LOGGER.debug('Web-socket binary data received.')
                elif payload.type == aiohttp.WSMsgType.PING:
                    LOGGER.debug('Web-socket ping received')
                    ws.pong()
                elif payload.type == aiohttp.WSMsgType.PONG:
                    LOGGER.debug('Web-socket pong received')
                else:
                    if payload.type == aiohttp.WSMsgType.CLOSE:
                        await ws.close()
                    elif payload.type == aiohttp.WSMsgType.ERROR:
                        LOGGER.info(f'Error during receive {ws.exception()}')
                    elif payload.type == aiohttp.WSMsgType.CLOSED:
                        pass

                    break

Resulting log:

(pyskyq-4vSEKDfZ) ✔ [brad@bradmac:~/Code/pyskyq] [16-create-status-end-point-property|✚ 5…2] $ pyskyq -vv green
[2018-09-26 09:12:14] DEBUG:pyskyq.cli:Starting SkyQ...
[2018-09-26 09:12:14] DEBUG:pyskyq.skyremote:Initialised SkyRemote object with host=skyq, port=49160
[2018-09-26 09:12:14] DEBUG:pyskyq.status:Initialised Status object object with host=skyq, port=9006
[2018-09-26 09:12:14] DEBUG:asyncio:Using selector: KqueueSelector
[2018-09-26 09:12:14] DEBUG:pyskyq.status:Asyncio event loop thread running...
[2018-09-26 09:12:14] DEBUG:pyskyq.skyq:Initialised SkyQ object with host=skyq.
[2018-09-26 09:12:14] INFO:pyskyq.cli:Script ends here
[2018-09-26 09:12:14] DEBUG:pyskyq.status:Setting up web socket listener on ws://skyq:9006/as/system/status.
[2018-09-26 09:12:14] DEBUG:pyskyq.status:Inside _ws_subscribe()...
[2018-09-26 09:12:14] DEBUG:asyncio:Get address info skyq:9006, type=<SocketKind.SOCK_STREAM: 1>
[2018-09-26 09:12:14] DEBUG:asyncio:Getting address info skyq:9006, type=<SocketKind.SOCK_STREAM: 1> took 15.352ms: [(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('10.0.1.6', 9006))]
[2018-09-26 09:12:14] DEBUG:asyncio:poll 60478.847 ms took 14.299 ms: 1 events
[2018-09-26 09:12:14] DEBUG:asyncio:connect <socket.socket fd=8, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('0.0.0.0', 0)> to ('10.0.1.6', 9006)
[2018-09-26 09:12:14] DEBUG:asyncio:poll 60461.657 ms took 21.045 ms: 1 events
[2018-09-26 09:12:14] DEBUG:asyncio:<socket.socket fd=8, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('10.0.1.30', 50141), raddr=('10.0.1.6', 9006)> connected to 10.0.1.6:9006: (<_SelectorSocketTransport fd=8 read=polling write=<idle, bufsize=0>>, <aiohttp.client_proto.ResponseHandler object at 0x1033c8ac8>)
[2018-09-26 09:12:14] DEBUG:asyncio:poll 60429.970 ms took 7.192 ms: 1 events
[2018-09-26 09:12:14] DEBUG:pyskyq.status:Inside _ws_subscribe() infinite loop pre ws.receive.

However, the socket is open and serving data as can be seen by netcat.

(pyskyq-4vSEKDfZ) ✔ [brad@bradmac:~/Code/pyskyq] [16-create-status-end-point-property|✚ 5…2] $ nc skyq 9006
GET /as/system/status HTTP/1.1
Host: skyq:9006
Accept: */*
Connection: Upgrade
Upgrade: websocket

�~�{
   "camessage" : {
      "reason" : "no message",
      "state" : "unavailable"
   },
   "drmstatus" : {
      "state" : "available"
   },
   "entitlements" : [
      "ANALYTICS",
      "BIGBASIC",
      "ETHAN_APP_1",
      "HD",
      "PDL",
      "SKY_DRM_CE",
      "SKY_DRM_MR",
      "SKY_IPPV",
      "ULTRA+",
      "SKY+",
      "GATEWAYENABLER",
      "SIDELOAD"
   ],
   "epginfobits" : {
      "epginfobits" : "0xFDE5FFC0",
      "mask" : "0x5FFA003F",
      "state" : "available"
   },
   "gatewayservices" : {
      "state" : "available"
   },
   "hdmi" : {
      "2160p10bitCapable" : false,
      "authenticatedHDCP" : "1.x",
      "sinkHDCP" : "1.x",
      "sinkHLG" : false,
      "sinkUHD" : false,
      "state" : "available",
      "uhdConfigured" : false
   },
   "network" : {
      "state" : "available"
   },
   "nssplayback" : {
      "state" : "available"
   },
   "pvr" : {
      "state" : "available"
   },
   "schedule" : {
      "lastdate" : "20181003",
      "state" : "available"
   },
   "servicelist" : {
      "state" : "available"
   },
   "smartcard" : {
      "active" : true,
      "bouquet" : "4101",
      "countryCode" : "GBR",
      "currency" : "GBP",
      "cwe" : true,
      "householdid" : "10947783",
      "paired" : true,
      "state" : "available",
      "subbouquet" : "1",
      "transactionlimit" : 65535,
      "viewingCardNumber" : "725 325 260"
   },
   "swupdate" : {
      "reason" : "IDLE",
      "state" : "unavailable"
   },
   "systemupdates" : {
      "entitlements" : 2,
      "install" : 1,
      "servicegenres" : 1,
      "smartcard" : 1
   },
   "updatetask" : {
      "reason" : "no update",
      "state" : "unavailable"
   }
}
^C(pyskyq-4vSEKDfZ) ✔ [brad@bradmac:~/Code/pyskyq] [16-create-status-end-point-property|✚ 5…2] $

Your environment

(pyskyq-4vSEKDfZ) ✔ [brad@bradmac:~/Code/pyskyq] [16-create-status-end-point-property|✚ 5…2] $ uname -a
Darwin bradmac 17.7.0 Darwin Kernel Version 17.7.0: Thu Jun 21 22:53:14 PDT 2018; root:xnu-4570.71.2~1/RELEASE_X86_64 x86_64
(pyskyq-4vSEKDfZ) ✔ [brad@bradmac:~/Code/pyskyq] [16-create-status-end-point-property|✚ 5…2] $ pip freeze
aiohttp==3.4.4
appnope==0.1.0
arrow==0.12.1
async-timeout==3.0.0
async-upnp-client==0.12.4
attrs==18.2.0
backcall==0.1.0
binaryornot==0.4.4
bleach==2.1.3
certifi==2018.4.16
chardet==3.0.4
click==6.7
cookiecutter==1.6.0
cycler==0.10.0
decorator==4.3.0
Django==2.1.1
entrypoints==0.2.3
ez-setup==0.9
future==0.16.0
html5lib==1.0.1
idna==2.7
idna-ssl==1.1.0
ipykernel==4.8.2
ipython==6.4.0
ipython-genutils==0.2.0
ipywidgets==7.2.1
jedi==0.12.0
Jinja2==2.10
jinja2-time==0.2.0
jsonschema==2.6.0
jupyter==1.0.0
jupyter-client==5.2.3
jupyter-console==5.2.0
jupyter-core==4.4.0
kaggle==1.4.2
kiwisolver==1.0.1
MarkupSafe==1.0
matplotlib==2.2.2
mistune==0.8.3
multidict==4.4.2
nbconvert==5.3.1
nbformat==4.4.0
notebook==5.5.0
numpy==1.14.4
pandocfilters==1.4.2
parso==0.2.1
pexpect==4.6.0
pickleshare==0.7.4
pipenv==2018.7.1
poyo==0.4.1
prompt-toolkit==1.0.15
ptyprocess==0.6.0
Pygments==2.2.0
pyparsing==2.2.0
PyScaffold==3.0.3
python-dateutil==2.7.3
python-didl-lite==1.1.0
pytz==2018.4
pyzmq==17.0.0
qtconsole==4.3.1
requests==2.19.1
Send2Trash==1.5.0
simplegeneric==0.8.1
six==1.11.0
terminado==0.8.1
testpath==0.3.1
tornado==5.0.2
tqdm==4.24.0
traitlets==4.3.2
urllib3==1.22
virtualenv==16.0.0
virtualenv-clone==0.3.0
voluptuous==0.11.5
wcwidth==0.1.7
webencodings==0.5.1
whichcraft==0.4.1
widgetsnbextension==3.2.1
yarl==1.2.6
(pyskyq-4vSEKDfZ) ✔ [brad@bradmac:~/Code/pyskyq] [16-create-status-end-point-property|✚ 5…2] $

update...

Python version is 3.7.0

asvetlov commented 6 years ago

GitMate.io thinks possibly related issues are https://github.com/aio-libs/aiohttp/issues/2200 (background Client websockets hangs on SIGINT), https://github.com/aio-libs/aiohttp/issues/1002 (Websocket response .close() can hang indefinitely), https://github.com/aio-libs/aiohttp/issues/376 (ProactorEventLoop hangs ), https://github.com/aio-libs/aiohttp/issues/3027 (ws_connect hang), and https://github.com/aio-libs/aiohttp/issues/265 (Implement client-side websocket).

bradwood commented 6 years ago

it looks like client_ws.py#L204 is not returning... any ideas?

The server is a SkyQ set-top box, so might not be entirely RFC-compliance implementation of WS. For example, it doesn't honour ping/poings... And there is that weird �~� in the payload...Not sure if that's an opcode or some other quirk...

asvetlov commented 6 years ago

Websocket is a binary protocol �~� encodes a message type and length. Sorry, I cannot help you without your participation in decoding it and comparing to actual payload size.

bradwood commented 6 years ago

�~� is 001111110111111000111111 in binary, which is: 3F7E3F in hex.

those bytes look like this end-to-end...
00111111 01111110 00111111

I am not an expert on th RFC, but what is this saying in terms of type and length?

Do you think the problem is with that? Ie, it not agreeing to the actual payload length, and the thing waiting for an extra frame, for example?

bradwood commented 6 years ago

Interestingly, this web socket cli client, written in go, handles the output fine, including that binary stuff at the top:


ws ws://skyq:9006/as/system/status
< {
   "camessage" : {
      "reason" : "no message",
      "state" : "unavailable"
   },
   "drmstatus" : {
      "state" : "available"
   },
   "entitlements" : [
      "ANALYTICS",
      "BIGBASIC",
      "ETHAN_APP_1",
      "HD",
      "PDL",
      "SKY_DRM_CE",
      "SKY_DRM_MR",
      "SKY_IPPV",
      "ULTRA+",
      "SKY+",
      "GATEWAYENABLER",
      "SIDELOAD"
   ],
   "epginfobits" : {
      "epginfobits" : "0xFDE5FFC0",
      "mask" : "0x5FFA003F",
      "state" : "available"
   },
   "gatewayservices" : {
      "state" : "available"
   },
   "hdmi" : {
      "2160p10bitCapable" : false,
      "authenticatedHDCP" : "NONE",
      "reason" : "HDMI output port is disabled",
      "sinkHDCP" : "NONE",
      "sinkHLG" : false,
      "sinkUHD" : false,
      "state" : "unavailable",
      "uhdConfigured" : false
   },
   "network" : {
      "state" : "available"
   },
   "nssplayback" : {
      "state" : "available"
   },
   "pvr" : {
      "state" : "available"
   },
   "schedule" : {
      "lastdate" : "20181003",
      "state" : "available"
   },
   "servicelist" : {
      "state" : "available"
   },
   "smartcard" : {
      "active" : true,
      "bouquet" : "4101",
      "countryCode" : "GBR",
      "currency" : "GBP",
      "cwe" : true,
      "householdid" : "10947783",
      "paired" : true,
      "state" : "available",
      "subbouquet" : "1",
      "transactionlimit" : 65535,
      "viewingCardNumber" : "725 325 260"
   },
   "swupdate" : {
      "reason" : "IDLE",
      "state" : "unavailable"
   },
   "systemupdates" : {
      "entitlements" : 2,
      "install" : 1,
      "servicegenres" : 1,
      "smartcard" : 1
   },
   "updatetask" : {
      "reason" : "no update",
      "state" : "unavailable"
   }
}

> ^C
Interrupt
✔ [brad@bradmac:~] $```
asvetlov commented 6 years ago

The protocol says: the first 3F byte in WS message is forbidden (reserved for future use). Are you sure that you didn't miss something?

bradwood commented 6 years ago

I assume you are referring to the 2d, 3rd and 4th bit in the base framing protocol. In which case yes, maybe I did get it wrong. It could be a unicode/ASCII issue. I'll take another look.

bradwood commented 6 years ago

I've tried a few other online converters... I'm getting this now: hex = ef bf bd 7e ef bf bd binary = 11101111 10111111 10111101 01111110 11101111 10111111 10111101

If according the RFC 6455

5.2.  Base Framing Protocol

   This wire format for the data transfer part is described by the ABNF
   [RFC5234] given in detail in this section.  (Note that, unlike in
   other sections of this document, the ABNF in this section is
   operating on groups of bits.  The length of each group of bits is
   indicated in a comment.  When encoded on the wire, the most
   significant bit is the leftmost in the ABNF).  A high-level overview
   of the framing is given in the following figure.  In a case of
   conflict between the figure below and the ABNF specified later in
   this section, the figure is authoritative.

      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-------+-+-------------+-------------------------------+
     |F|R|R|R| opcode|M| Payload len |    Extended payload length    |
     |I|S|S|S|  (4)  |A|     (7)     |             (16/64)           |
     |N|V|V|V|       |S|             |   (if payload len==126/127)   |
     | |1|2|3|       |K|             |                               |
     +-+-+-+-+-------+-+-------------+ - - - - - - - - - - - - - - - +
     |     Extended payload length continued, if payload len == 127  |
     + - - - - - - - - - - - - - - - +-------------------------------+
     |                               |Masking-key, if MASK set to 1  |
     +-------------------------------+-------------------------------+
     | Masking-key (continued)       |          Payload Data         |
     +-------------------------------- - - - - - - - - - - - - - - - +
     :                     Payload Data continued ...                :
     + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
     |                     Payload Data continued ...                |
     +---------------------------------------------------------------+

   FIN:  1 bit

      Indicates that this is the final fragment in a message.  The first
      fragment MAY also be the final fragment.

   RSV1, RSV2, RSV3:  1 bit each

      MUST be 0 unless an extension is negotiated that defines meanings
      for non-zero values.  If a nonzero value is received and none of
      the negotiated extensions defines the meaning of such a nonzero
      value, the receiving endpoint MUST _Fail the WebSocket
      Connection_.

Then it looks like RSV1 and RSV2 are being set to 1, as you suggest, unless some etxension is being negotiated? I have no idea.

The number of the characters in the JSON doc is 1703

The RFC says this about payload length:

Payload length:  7 bits, 7+16 bits, or 7+64 bits

      The length of the "Payload data", in bytes: if 0-125, that is the
      payload length.  If 126, the following 2 bytes interpreted as a
      16-bit unsigned integer are the payload length.  If 127, the
      following 8 bytes interpreted as a 64-bit unsigned integer (the
      most significant bit MUST be 0) are the payload length.  Multibyte
      length quantities are expressed in network byte order.  Note that
      in all cases, the minimal number of bytes MUST be used to encode
      the length, for example, the length of a 124-byte-long string
      can't be encoded as the sequence 126, 0, 124.  The payload length
      is the length of the "Extension data" + the length of the
      "Application data".  The length of the "Extension data" may be
      zero, in which case the payload length is the length of the
      "Application data".

So, given all that, where do we go next? If the server is brain-dead and not RFC compliant, then I need to work around it as other client's clearly can do... Can you offer any suggestions?

asvetlov commented 6 years ago

Still looks suspicious. The only known extension is WebSocket compression which utilizes RSV1. I don't know what extension uses RSV2. Moreover, opcode FF is reserved too and should be not used.

Unless you don't know what the server sends there is no way to "fix" a client.

bradwood commented 6 years ago

@asvetlov I tried the websockets library and it works... So I'm going to go with that, as I cannot spend more time on debugging this library with you...

aoihttp -- doesn't work

import asyncio
import logging
import sys

import aiohttp

LOGGER = logging.getLogger(__name__)
logformat = "[%(asctime)s] %(levelname)s:%(name)s:%(message)s"
logging.basicConfig(level=logging.DEBUG, stream=sys.stdout,
                    format=logformat, datefmt="%Y-%m-%d %H:%M:%S")

async def main():
    async with aiohttp.ClientSession() as session:
        async with session.ws_connect('http://skyq:9006/as/system/status') as ws:
            payload = await ws.receive()
            print(payload.text)

loop = asyncio.get_event_loop()
loop.run_until_complete(main())

websockets -- works

import asyncio
import websockets
import logging
import sys

LOGGER = logging.getLogger(__name__)
logformat = "[%(asctime)s] %(levelname)s:%(name)s:%(message)s"
logging.basicConfig(level=logging.DEBUG, stream=sys.stdout,
                    format=logformat, datefmt="%Y-%m-%d %H:%M:%S")

async def hello():
    async with websockets.connect(
            'ws://skyq:9006/as/system/status') as websocket:

        payload = await websocket.recv()
        print(f"{payload}")

asyncio.get_event_loop().run_until_complete(hello())

Fee free to close this ticket if you wish to, but you might want to keep it open as it does appear to be a bug... Or at least a situation which does not obey Postel's law.

I appreciate your help in trying to address this.

bradwood commented 6 years ago

Still looks suspicious. The only known extension is WebSocket compression which utilizes RSV1. I don't know what extension uses RSV2. Moreover, opcode FF is reserved too and should be not used.

Hi @asvetlov

Unless you don't know what the server sends there is no way to "fix" a client.

I think you meant "unless, you do know what a server sends, you cannot fix a client."

In which case, I completely disagree with this point.

You are familiar with the Robustness Principle?

"Be conservative in what you send, and liberal in what you receive."

Regardless of MUSTs and SHOULDs and MUST NOTs, in the RFC, I think you are interpreting and expecting entities to comply with the RFC completely... Clearly they do not, and many many internet servers and clients out there, of various types, have horrendous bastardised half-implementations of protocols.

A good library/protocol implementation must be forgiving of these to the extent that it can. See RFC1122. It is completely possible to write a protocol client to be forgiving of (some) protocol violations without having access to the other participant in the communication.

I feel this is not the case here, you seem to be taking the hard-line view that if a host is not 100% compliant then too bad for it, we will not talk to it. This is too bad. It's like saying: 'I only speak the Queen's English, and if you come talking to me in some weird dialect of English, I won't talk to you.'

Clearly this is not a good approach if the whole point of the library is to facilitate communication.

My suggestion, therefore, is to consider where you can relax stringent requirements of the protocol (maybe via method kwargs, that turn on the "relaxed" handling), and see if you can make things talk even when the other side is not behaving properly.

Aside from this critique, I'd like to thank you and the contributors from aio-libs for all your work for the community. 👍 🥇

bradwood commented 6 years ago

@asvetlov Just an update on this... I figured out that my (non-compliant) server requires an uppercase Upgrade: header before it will work. Just thought I'd mention it in case the issue of Title Case headers is being considered by you guys... This is a real case of a protocol violation from a poorly implemented server which causes the websocket client not to work..

Dreamsorcerer commented 2 months ago

Current releases send the upgrade header in that format, so unless there's a method to reproduce the original issue, there's not much we can do here.