Closed spikefishjohn closed 7 months ago
Not sure what I did to strike through the OS but that is all correct.
It does look a little odd, and seems a little obvious which makes me wonder why it hasn't been noticed before.
Do you think you could write a test in a PR to reproduce it? Then we could also try your suggestion of removing that check and seeing if any other tests break, which would be a good indication if it's the wrong change to make.
I think the code is expecting that the client will close the connection, but we can't always rely on that even if they say they're going to close it, they may not.
yeah, I forked aiohttp last night. I'll work on a small replication. I did try removing the call. This results in the next read request in close() failing with timeout error and then the socket does close but with a .. uh.. 1006 message I think.
I also started looking out how to pass message to close() so I could add a condition for the read statement. Didn't get very far but it was kind of late.
https://datatracker.ietf.org/doc/html/rfc6455#section-7
Once an endpoint has both sent and received a Close control frame, that endpoint SHOULD Close the WebSocket Connection
So, this is what it is trying to achieve when close is initiated by us.
As such, when a server is instructed to Close the WebSocket Connection it SHOULD initiate a TCP Close immediately, and when a client is instructed to do the same, it SHOULD wait for a TCP Close from the server.
Except as indicated above or as specified by the application layer (e.g., a script using the WebSocket API), clients SHOULD NOT close the connection.
I think we just need to tweak that code so that the server does not wait to receive a close code (as it has already received one), but it should continue to close the transport. i.e. Rather than removing that check we want to add self._set_code_close_transport(...)
in there (but, we might need to avoid calling it twice?).
so something like this instead of deleting? I looked through the code and it seems like _close_code is always set but i'm not sure if its safe to assume it is? Default is None.
if self._closing:
self._set_code_close_transport(self._close_code)
return True
but i'm not sure if its safe to assume it is?
Not sure, I'd have to look more closely. If we get the test up first, then we can see how it works.
Boiling this down has been harder then I expected. So far I'm not recreating the issue. More research and foul language will be required.
I'm pretty convinced there is a bug here where we don't close the transport if the client holds it open forever
@spikefishjohn Can you give https://github.com/aio-libs/aiohttp/pull/8200 a shot? I'm running it on my production HA systems without any unexpected side effects
I'll come up with a test for it if it fixes your issue
@bdraco yeah i'll give it a test. I've been trying to reproduce the issue with a smaller code base but haven't been able to so far which has been pretty frustrating.
I'll add this in today and see if it fixes the issue. If not i'll add a pdb trace of the accept to show what is happening if that helps.
Ok that seems to fix the issue. Here is the client I'm using to talk to gns3.
import asyncio
import base64
import json
import websockets
import logging
logger = logging.getLogger('websockets')
logger.setLevel(logging.DEBUG)
handler = logging.StreamHandler()
formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s')
handler.setFormatter(formatter)
logger.addHandler(handler)
log = logging.getLogger(__name__)
# GNS3 WebSocket server violates RFC 6455 so we have to be active closer
# Lets give Websockets a chance to get data.
WS_CLOSE_TIMEOUT = 10
RECONNECT_TIMEOUT = 1.618
CONTROLLER_WS_API = '/v2/notifications/ws'
COMPUTE_WS = '/v2/notifications/ws'
SERVER = '127.0.0.1:3080'
USER = 'XXX'
PASS = 'XXX'
CREDS = f'{USER}:{PASS}'
ENCODED_CREDS = base64.b64encode(CREDS.encode()).decode()
CONTROLLER_URI = f'ws://{SERVER}{CONTROLLER_WS_API}'
COMPUTE_URI = f'ws://{SERVER}{COMPUTE_WS}'
async def main() -> None:
async with asyncio.TaskGroup() as tasks:
tasks.create_task(websocket_logger(CONTROLLER_URI))
async def websocket_logger(endpoint: str) -> None:
headers = {
'Authorization': f'Basic {ENCODED_CREDS}'
}
try:
async with websockets.connect(endpoint, close_timeout=WS_CLOSE_TIMEOUT, extra_headers=headers) as websocket:
print("Call close")
await websocket.close()
print("close complete")
except ConnectionRefusedError:
log.info(f'Connection to {endpoint!r} refused.')
await asyncio.sleep(RECONNECT_TIMEOUT)
if __name__ == '__main__':
asyncio.run(main())
This is what the client now reports.
john@compute01:~$ python3.11 ws-client.py
2024-03-02 11:40:39,775 - DEBUG - = connection is CONNECTING
2024-03-02 11:40:39,775 - DEBUG - > GET /v2/notifications/ws HTTP/1.1
2024-03-02 11:40:39,775 - DEBUG - > Host: 127.0.0.1:3080
2024-03-02 11:40:39,775 - DEBUG - > Upgrade: websocket
2024-03-02 11:40:39,775 - DEBUG - > Connection: Upgrade
2024-03-02 11:40:39,775 - DEBUG - > Sec-WebSocket-Key: XXX
2024-03-02 11:40:39,775 - DEBUG - > Sec-WebSocket-Version: 13
2024-03-02 11:40:39,775 - DEBUG - > Sec-WebSocket-Extensions: permessage-deflate; client_max_window_bits
2024-03-02 11:40:39,776 - DEBUG - > Authorization: Basic XXX
2024-03-02 11:40:39,776 - DEBUG - > User-Agent: Python/3.11 websockets/12.0
2024-03-02 11:40:39,779 - DEBUG - < HTTP/1.1 101 Switching Protocols
2024-03-02 11:40:39,779 - DEBUG - < Upgrade: websocket
2024-03-02 11:40:39,779 - DEBUG - < Connection: upgrade
2024-03-02 11:40:39,779 - DEBUG - < Sec-WebSocket-Accept: XXX
2024-03-02 11:40:39,779 - DEBUG - < Sec-WebSocket-Extensions: permessage-deflate
2024-03-02 11:40:39,779 - DEBUG - < Date: Sat, 02 Mar 2024 16:40:39 GMT
2024-03-02 11:40:39,779 - DEBUG - < Server: Python/3.10 aiohttp/3.9.3
2024-03-02 11:40:39,779 - DEBUG - = connection is OPEN
Call close
2024-03-02 11:40:39,779 - DEBUG - = connection is CLOSING
2024-03-02 11:40:39,779 - DEBUG - > CLOSE 1000 (OK) [2 bytes]
2024-03-02 11:40:39,780 - DEBUG - < TEXT '{"action": "ping", "event": {"cpu_usage_percent...y_usage_percent": 3.4}}' [84 bytes]
2024-03-02 11:40:39,781 - DEBUG - < CLOSE 1000 (OK) [2 bytes]
2024-03-02 11:40:39,781 - DEBUG - = connection is CLOSED
close complete
john@compute01:~$
I'll pass this on the original poster of the bug and have them test as well.
I'll pass this on the original poster of the bug and have them test as well.
Thanks. Please keep us updated.
FYI the bug reporter indicated they won't be able to test for a week or so.
@spikefishjohn Was the reporter able to test the linked PR? Thanks
@bdraco I'm asking for an update.
Thanks!
@bdraco The original poster of the GNS3 issues has indicated they will not be able to test this and asked that the original GNS3 bug be closed. Your patch fixed the client I made above for GNS3. I think that is as much as a reply as this will get.
That being said if you think there is something else I can do to help by all means let me know.
Describe the bug
I'm working on a https://github.com/GNS3/gns3-server/issues/2320 for GNS3 that I think PR7978 lines up with. The GNS3 issue is that when a web socket client sends a close message the server never closes the tcp socket. I tried the patch above but it doesn't address the issue.
High level GNS3 is calling ws.receive() at which point aiohttp receive() the close message.
I believe the problem is because ws.receive is setting self._closing = True when it reiceves a close message. This then causes self.close() to return here which prevents if msg.type == WSMsgType.CLOSE: from being reached.
I'm almost thinking this should be removed but i'm not sure what the intent of that is so i'm unsure if that is the proper fix.
To Reproduce
I don't have a great way to reproduce this. I'm currently just using a fully installed GNS3 instance. I can work on making a reproduction now that I understand the issue.
Expected behavior
aiohttp.web_ws should close the socket when it receives a close message.
Logs/tracebacks
Python Version
aiohttp Version
multidict Version
yarl Version
Related component
Server
Additional context
No response
Code of Conduct
EDIT: Update PR request in description.