Open goldstar611 opened 8 months ago
Hi @goldstar611, Unfortunately I did have the time to continue with this project, and it's been a while since I tried to run it. I did see some activity in this repo (clones/views) recently so if you want to create a PR to fix the issue I will be happy to accept it. I will also try to find some time soon to try and run it. Thanks!
I was able to get pyrdgw working by removing http_parser and replacing it will a simple python built-in (email.parser.BytesParser actually but that is just a quick fix).
I'm having some random disconnections, usually when opening the windows menu, Do you think this is related to the receive buffer length? (65536)
Hi @goldstar611, I tried to run it with Python 3.8.10 and the latest websockets and http_parser and it works, though I also see some random disconnections, I will try to figure out why. When I tried to run it with the latest Python 3.12.2, the installation of http_parser fails. There's an open issue for this failure: https://github.com/benoitc/http-parser/issues/99 I'll try again when they fix it.
Hi, thanks for writing back. Indeed the deprecated imp module prevents http-parser from compiling on newer versions of Python.
My thought is that since __gw_handler is not called much that a pure python replacement would be better suited here. This is how I ended up using email.parser.
In fact, if you look at the source code for http.server.BaseHTTPRequestHandler, you can see headers is type email.message.Message so the idea is not too strange.
Sounds like a valid choice so if you want to open a PR when you're ready it will be great :) Regarding the disconnections - I tried to increase the buffer length but I still get disconnections, it always after sending a relatively large packet (20-30K) from the server to the client, something gets stuck, it needs some more debugging.
Actually, the only reason the http_parser and the HTTP server are needed is because RDGW uses a custom RDG_OUT_DATA HTTP method instead of the standard GET method to negotiate the WebSocket connection. It's not ideal to have 2 endpoints with 2 TLS connections just for that, but I didn't find a better way back then (apart from changing the code of websockets which I didn't like). If you have a better suggestion I would be happy to hear :)
Below is what I have currently.
The modification was sourced from https://stackoverflow.com/questions/4685217/parse-raw-http-headers
The same person has an alternate suggestion to use BaseHTTPRequestHandler
which is probably the best way forward. BytesParser
was just simpler to implement ;)
diff --git a/pyrdgw/server/gw_server.py b/pyrdgw/server/gw_server.py
index 6f149f4..ef613d5 100644
--- a/pyrdgw/server/gw_server.py
+++ b/pyrdgw/server/gw_server.py
@@ -4,8 +4,7 @@ import pathlib
import ssl
import websockets
import logging
-
-from http_parser.pyparser import HttpParser
+from email.parser import BytesParser
from pyrdgw.log_messages import *
@@ -31,20 +30,12 @@ class GWServer:
async def __gw_handler(self, server_reader, server_writer):
- parser = HttpParser()
-
recv_buf = await server_reader.read(65536)
- len_parsed = parser.execute(recv_buf, len(recv_buf))
-
-
- if len_parsed != len(recv_buf):
- raise Exception(LogMessages.GW_FAILED_PROCESS_HTTP_REQUEST)
-
- headers = parser.get_headers()
+ request_line, headers_alone = recv_buf.split(b'\r\n', 1)
+ method = request_line.split(b" ")[0]
+ headers = BytesParser().parsebytes(headers_alone)
- if parser.get_method() == 'RDG_OUT_DATA' and \
- 'CONNECTION' in headers and headers['CONNECTION'] == 'Upgrade' and \
- 'UPGRADE' in headers and headers['UPGRADE'] == 'websocket':
+ if method == b'RDG_OUT_DATA' and headers.get('CONNECTION') == 'Upgrade' and headers.get('UPGRADE') == 'websocket':
websocket_hostname = self.config['websocket_server']['hostname']
websocket_port = self.config['websocket_server']['port']
@@ -58,7 +49,7 @@ class GWServer:
async_loop = asyncio.get_event_loop()
client_reader, client_writer = await asyncio.open_connection(
- websocket_hostname, websocket_port, ssl=ssl_ctx, loop=async_loop)
+ websocket_hostname, websocket_port, ssl=ssl_ctx)
start_index = len(b'RDG_OUT_DATA')
send_buf = b'GET' + recv_buf[start_index:]
@@ -92,7 +83,7 @@ class GWServer:
loop = asyncio.get_event_loop()
coroutine = asyncio.start_server(
- self.__gw_handler, hostname, port, ssl=ssl_context, loop=loop)
+ self.__gw_handler, hostname, port, ssl=ssl_context)
server = loop.run_until_complete(coroutine)
[Edit]: I changed
if method == b'RDG_OUT_DATA' and headers.get('CONNECTION') == b'Upgrade' and headers.get('UPGRADE') == b'websocket':
to
if method == b'RDG_OUT_DATA' and headers.get('CONNECTION') == 'Upgrade' and headers.get('UPGRADE') == 'websocket':
I'm think that using something like AIOHTTP would be a good solution to remove the websocket server. What I noticed is that when the client disconnects it always happens that __gw_forward_data()
throws an exception. So I think that the blind forwarding of 65536 bytes at a time might be the problem. Since the "HTTP" server is not websockets aware, it is just taking data and sending it to websockets. It doesn't know if it has a complete message or not.
Regarding dropped connections, I was able to bypass the server on port 443 and connect directly to the websocket server on port 8443 with the patch below but dropped connections still continue during large screen refreshes. So maybe the websockets library is not efficient enough for use as a reverse proxy.
[Edit]: I didnt see this line until after I posted: https://github.com/gabriel-sztejnworcel/pyrdgw/blob/d780896bdc15cab3a8de8ac3e033e1abd5ce7085/pyrdgw/protocol/state_machine.py#L227
diff --git a/pyrdgw/server/websocket_server.py b/pyrdgw/server/websocket_server.py
index e91b1ef..9675742 100644
--- a/pyrdgw/server/websocket_server.py
+++ b/pyrdgw/server/websocket_server.py
@@ -1,19 +1,21 @@
import asyncio
-import pathlib
import ssl
import websockets
import logging
+from websockets import InvalidMessage
+from websockets.legacy.http import d, read_headers, read_line
+
from pyrdgw.protocol.state_machine import *
class RDGWWebSocketServerProtocol(websockets.WebSocketServerProtocol):
def __init__(
- self,
- *args,
- **kwargs):
+ self,
+ *args,
+ **kwargs):
self.rdg_connection_id = ''
self.rdg_correlation_id = ''
@@ -42,6 +43,41 @@ class RDGWWebSocketServerProtocol(websockets.WebSocketServerProtocol):
return None
+ async def read_http_request(self):
+ stream = self.reader
+ try:
+ try:
+ request_line = await read_line(stream)
+ except EOFError as exc:
+ raise EOFError("connection closed while reading HTTP request line") from exc
+
+ try:
+ method, raw_path, version = request_line.split(b" ", 2)
+ except ValueError: # not enough values to unpack (expected 3, got 1-2)
+ raise ValueError(f"invalid HTTP request line: {d(request_line)}") from None
+
+ if method != b"RDG_OUT_DATA":
+ raise ValueError(f"unsupported HTTP method: {d(method)}")
+ if version != b"HTTP/1.1":
+ raise ValueError(f"unsupported HTTP version: {d(version)}")
+ path = raw_path.decode("ascii", "surrogateescape")
+
+ headers = await read_headers(stream)
+ except asyncio.CancelledError: # pragma: no cover
+ raise
+ except Exception as exc:
+ raise InvalidMessage("did not receive a valid HTTP request") from exc
+
+ if self.debug:
+ self.logger.debug("< GET %s HTTP/1.1", path)
+ for key, value in headers.raw_items():
+ self.logger.debug("< %s: %s", key, value)
+
+ self.path = path
+ self.request_headers = headers
+
+ return path, headers
+
class WebSocketServer:
I get the read_http_request trick, works great and removes the complexity of 2 servers. I'm wondering about the chance it will break sometime, since we're overriding an internal function, but anyway I think for now it's a lot better than having 2 servers, so if you want to create a PR it would be great. I'm also thinking about converting it to a library instead of an application, so we can pass a custom authentication/authorization handler based on the token in the RDP file.
I get the read_http_request trick,
Thanks, I took inspiration your change at https://github.com/gabriel-sztejnworcel/pyrdgw/blob/main/pyrdgw/server/gw_server.py#L64
works great and removes the complexity of 2 servers. I'm wondering about the chance it will break sometime, since we're overriding an internal function
We could instead, change RDG_OUT_DATA
to GET
in read_http_request
, then the function becomes a simple replace and super()
call. It would be cleaner too since I wouldn't need to import "private" functions such as d
and read_headers
, and read_line
. (These are not delcared in all)
I'm also thinking about converting it to a library instead of an application, so we can pass a custom authentication/authorization handler based on the token in the RDP file.
I had the same idea! but all I did was add # TODO: do something with PAA token
:)
Is there any interest in updating this codebase or accepting a PR? I tried to run this with the latest websockets and a new version of python and there's an error out of the box about unexpected parameter
loop
and past that the receive buffer seems to be empty.Let me know, Thanks