abhinavsingh / proxy.py

💫 Ngrok FRP Alternative • ⚡ Fast • 🪶 Lightweight • 0️⃣ Dependency • 🔌 Pluggable • 😈 TLS interception • 🔒 DNS-over-HTTPS • 🔥 Poor Man's VPN • ⏪ Reverse & ⏩ Forward • 👮🏿 "Proxy Server" framework • 🌐 "Web Server" framework • ➵ ➶ ➷ ➠ "PubSub" framework • 👷 "Work" acceptor & executor framework
https://abhinavsingh.com/proxy-py-a-lightweight-single-file-http-proxy-server-in-python/
BSD 3-Clause "New" or "Revised" License
3.01k stars 575 forks source link

[ConnectionPool] Add support for https session cache #748

Open JerryKwan opened 2 years ago

JerryKwan commented 2 years ago

Describe the bug ConnectionPool works well for http requests, but fails for https request.

To Reproduce Steps to reproduce the behavior:

  1. Run proxy.py as 'python -m proxy --num-workers 1 --log-level DEBUG --enable-conn-pool'
  2. Do 'curl -v -x localhost:8899 https://httpbin.org/get' to trigger error
  3. See error looks like this
# curl -v -x localhost:8899 https://httpbin.org/get
*   Trying ::1:8899...
* Connected to localhost (::1) port 8899 (#0)
* allocate connect buffer!
* Establish HTTP proxy tunnel to httpbin.org:443
> CONNECT httpbin.org:443 HTTP/1.1
> Host: httpbin.org:443
> User-Agent: curl/7.77.0
> Proxy-Connection: Keep-Alive
> 
< HTTP/1.1 200 Connection established
< 
* Proxy replied 200 to CONNECT request
* CONNECT phase completed!
* Initializing NSS with certpath: sql:/etc/pki/nssdb
*  CAfile: none
*  CApath: none
* loaded libnssckbi.so
* NSS error -5938 (PR_END_OF_FILE_ERROR)
* Encountered end of file
* Closing connection 0
curl: (35) Encountered end of file

Expected behavior ConnectionPool works well for http and https requests

abhinavsingh commented 2 years ago

Let's turn this into an enhancement, as its a known issue and ConnectionPool is a WIP, as advertised by its flag description Default: False. (WIP) Enable upstream connection pooling. Other folks watching a bug label may get alarmed :D

abhinavsingh commented 2 years ago

@JerryKwan Assigned it back to you, it mades more sense as you are working on it.

abhinavsingh commented 2 years ago

@JerryKwan This update is unrelated to your PR. Something that I ran into and I wanted to document here for posterity:

ConnectionPool is now UpstreamConnectionPool. Today, I tried it with following:

Start proxy.py with connection pool

python -m proxy --log-level d --num-acceptors 1 --enable-conn-pool

Make 2 https requests using requests package

First one works fine. Second one runs into ssl.SSLError, which happens due to lack of support for HTTPS connection re-use. See below for server side debug logs

>>> r = requests.get("https://httpbin.org/get", proxies={"https": "http://localhost:8899"})
>>> r = requests.get("https://httpbin.org/get", proxies={"https": "http://localhost:8899"})
Traceback (most recent call last):
  File "/Users/abhinavsingh/Dev/proxy.py/venv310/lib/python3.10/site-packages/urllib3/connectionpool.py", line 696, in urlopen
    self._prepare_proxy(conn)
  File "/Users/abhinavsingh/Dev/proxy.py/venv310/lib/python3.10/site-packages/urllib3/connectionpool.py", line 964, in _prepare_proxy
    conn.connect()
  File "/Users/abhinavsingh/Dev/proxy.py/venv310/lib/python3.10/site-packages/urllib3/connection.py", line 416, in connect
    self.sock = ssl_wrap_socket(
  File "/Users/abhinavsingh/Dev/proxy.py/venv310/lib/python3.10/site-packages/urllib3/util/ssl_.py", line 449, in ssl_wrap_socket
    ssl_sock = _ssl_wrap_socket_impl(
  File "/Users/abhinavsingh/Dev/proxy.py/venv310/lib/python3.10/site-packages/urllib3/util/ssl_.py", line 493, in _ssl_wrap_socket_impl
    return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
  File "/Users/abhinavsingh/.pyenv/versions/3.10.0/lib/python3.10/ssl.py", line 512, in wrap_socket
    return self.sslsocket_class._create(
  File "/Users/abhinavsingh/.pyenv/versions/3.10.0/lib/python3.10/ssl.py", line 1070, in _create
    self.do_handshake()
  File "/Users/abhinavsingh/.pyenv/versions/3.10.0/lib/python3.10/ssl.py", line 1341, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLError: [SSL: INVALID_ALERT] invalid alert (_ssl.c:997)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/abhinavsingh/Dev/proxy.py/venv310/lib/python3.10/site-packages/requests/adapters.py", line 439, in send
    resp = conn.urlopen(
  File "/Users/abhinavsingh/Dev/proxy.py/venv310/lib/python3.10/site-packages/urllib3/connectionpool.py", line 755, in urlopen
    retries = retries.increment(
  File "/Users/abhinavsingh/Dev/proxy.py/venv310/lib/python3.10/site-packages/urllib3/util/retry.py", line 574, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='httpbin.org', port=443): Max retries exceeded with url: /get (Caused by SSLError(SSLError(1, '[SSL: INVALID_ALERT] invalid alert (_ssl.c:997)')))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/abhinavsingh/Dev/proxy.py/venv310/lib/python3.10/site-packages/requests/api.py", line 75, in get
    return request('get', url, params=params, **kwargs)
  File "/Users/abhinavsingh/Dev/proxy.py/venv310/lib/python3.10/site-packages/requests/api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "/Users/abhinavsingh/Dev/proxy.py/venv310/lib/python3.10/site-packages/requests/sessions.py", line 542, in request
    resp = self.send(prep, **send_kwargs)
  File "/Users/abhinavsingh/Dev/proxy.py/venv310/lib/python3.10/site-packages/requests/sessions.py", line 655, in send
    r = adapter.send(request, **kwargs)
  File "/Users/abhinavsingh/Dev/proxy.py/venv310/lib/python3.10/site-packages/requests/adapters.py", line 514, in send
    raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='httpbin.org', port=443): Max retries exceeded with url: /get (Caused by SSLError(SSLError(1, '[SSL: INVALID_ALERT] invalid alert (_ssl.c:997)')))
>>> 

Server side logs for 1st request

2021-12-29 17:48:22,318 - pid:60879 [D] acceptor.accept:119 - Accepting new work#30
2021-12-29 17:48:22,319 - pid:60879 [D] tcp_server.__init__:53 - Work#30 accepted from 127.0.0.1:49386
2021-12-29 17:48:22,319 - pid:60879 [D] handler.initialize:106 - Handling connection 127.0.0.1:49386
2021-12-29 17:48:22,319 - pid:60879 [D] threadless._update_work_events:210 - fd#30 registered for mask#1 by work#30
2021-12-29 17:48:22,320 - pid:60879 [D] handler.handle_readables:279 - Client is read ready, receiving...
2021-12-29 17:48:22,320 - pid:60879 [D] connection.recv:64 - received 36 bytes from client
2021-12-29 17:48:22,320 - pid:60879 [D] server.connect_upstream:598 - Connecting to upstream httpbin.org:443
2021-12-29 17:48:22,615 - pid:60879 [D] pool.acquire:102 - Created new connection#4388713184 for upstream httpbin.org:443
2021-12-29 17:48:22,615 - pid:60879 [D] server.connect_upstream:633 - Connected to upstream httpbin.org:443
2021-12-29 17:48:22,615 - pid:60879 [D] threadless._update_work_events:180 - fd#30 modified for mask#3 by work#30
2021-12-29 17:48:22,615 - pid:60879 [D] threadless._update_work_events:210 - fd#34 registered for mask#1 by work#30
2021-12-29 17:48:22,616 - pid:60879 [D] handler.handle_writables:249 - Client is write ready, flushing...
2021-12-29 17:48:22,616 - pid:60879 [D] tcp_server.handle_writables:99 - Flushing buffer to client 127.0.0.1:49386
2021-12-29 17:48:22,616 - pid:60879 [D] connection.flush:96 - flushed 39 bytes to client
2021-12-29 17:48:22,616 - pid:60879 [D] threadless._update_work_events:180 - fd#30 modified for mask#1 by work#30
2021-12-29 17:48:22,625 - pid:60879 [D] handler.handle_readables:279 - Client is read ready, receiving...
2021-12-29 17:48:22,626 - pid:60879 [D] connection.recv:64 - received 517 bytes from client
2021-12-29 17:48:22,626 - pid:60879 [D] threadless._update_work_events:180 - fd#34 modified for mask#3 by work#30
2021-12-29 17:48:22,626 - pid:60879 [D] server.write_to_descriptors:220 - Server is write ready, flushing...
2021-12-29 17:48:22,626 - pid:60879 [D] connection.flush:96 - flushed 517 bytes to server
2021-12-29 17:48:22,626 - pid:60879 [D] threadless._update_work_events:180 - fd#34 modified for mask#1 by work#30
2021-12-29 17:48:22,997 - pid:60879 [D] server.read_from_descriptors:257 - Server is read ready, receiving...
2021-12-29 17:48:22,997 - pid:60879 [D] connection.recv:64 - received 5404 bytes from server
2021-12-29 17:48:22,997 - pid:60879 [D] threadless._update_work_events:180 - fd#30 modified for mask#3 by work#30
2021-12-29 17:48:22,998 - pid:60879 [D] handler.handle_writables:249 - Client is write ready, flushing...
2021-12-29 17:48:22,998 - pid:60879 [D] tcp_server.handle_writables:99 - Flushing buffer to client 127.0.0.1:49386
2021-12-29 17:48:22,998 - pid:60879 [D] connection.flush:96 - flushed 5404 bytes to client
2021-12-29 17:48:22,998 - pid:60879 [D] threadless._update_work_events:180 - fd#30 modified for mask#1 by work#30
2021-12-29 17:48:22,999 - pid:60879 [D] handler.handle_readables:279 - Client is read ready, receiving...
2021-12-29 17:48:23,000 - pid:60879 [D] connection.recv:64 - received 126 bytes from client
2021-12-29 17:48:23,000 - pid:60879 [D] threadless._update_work_events:180 - fd#34 modified for mask#3 by work#30
2021-12-29 17:48:23,000 - pid:60879 [D] server.write_to_descriptors:220 - Server is write ready, flushing...
2021-12-29 17:48:23,000 - pid:60879 [D] connection.flush:96 - flushed 126 bytes to server
2021-12-29 17:48:23,001 - pid:60879 [D] threadless._update_work_events:180 - fd#34 modified for mask#1 by work#30
2021-12-29 17:48:23,268 - pid:60879 [D] server.read_from_descriptors:257 - Server is read ready, receiving...
2021-12-29 17:48:23,269 - pid:60879 [D] connection.recv:64 - received 51 bytes from server
2021-12-29 17:48:23,269 - pid:60879 [D] threadless._update_work_events:180 - fd#30 modified for mask#3 by work#30
2021-12-29 17:48:23,269 - pid:60879 [D] handler.handle_writables:249 - Client is write ready, flushing...
2021-12-29 17:48:23,270 - pid:60879 [D] tcp_server.handle_writables:99 - Flushing buffer to client 127.0.0.1:49386
2021-12-29 17:48:23,270 - pid:60879 [D] connection.flush:96 - flushed 51 bytes to client
2021-12-29 17:48:23,270 - pid:60879 [D] threadless._update_work_events:180 - fd#30 modified for mask#1 by work#30
2021-12-29 17:48:23,270 - pid:60879 [D] handler.handle_readables:279 - Client is read ready, receiving...
2021-12-29 17:48:23,271 - pid:60879 [D] connection.recv:64 - received 174 bytes from client
2021-12-29 17:48:23,271 - pid:60879 [D] threadless._update_work_events:180 - fd#34 modified for mask#3 by work#30
2021-12-29 17:48:23,271 - pid:60879 [D] server.write_to_descriptors:220 - Server is write ready, flushing...
2021-12-29 17:48:23,272 - pid:60879 [D] connection.flush:96 - flushed 174 bytes to server
2021-12-29 17:48:23,272 - pid:60879 [D] threadless._update_work_events:180 - fd#34 modified for mask#1 by work#30
2021-12-29 17:48:23,610 - pid:60879 [D] server.read_from_descriptors:257 - Server is read ready, receiving...
2021-12-29 17:48:23,610 - pid:60879 [D] connection.recv:64 - received 567 bytes from server
2021-12-29 17:48:23,610 - pid:60879 [D] threadless._update_work_events:180 - fd#30 modified for mask#3 by work#30
2021-12-29 17:48:23,611 - pid:60879 [D] handler.handle_writables:249 - Client is write ready, flushing...
2021-12-29 17:48:23,611 - pid:60879 [D] tcp_server.handle_writables:99 - Flushing buffer to client 127.0.0.1:49386
2021-12-29 17:48:23,611 - pid:60879 [D] connection.flush:96 - flushed 567 bytes to client
2021-12-29 17:48:23,611 - pid:60879 [D] threadless._update_work_events:180 - fd#30 modified for mask#1 by work#30
2021-12-29 17:48:23,612 - pid:60879 [D] handler.handle_readables:279 - Client is read ready, receiving...
2021-12-29 17:48:23,612 - pid:60879 [D] tcp_server.handle_readables:114 - Connection closed by client 127.0.0.1:49386
2021-12-29 17:48:23,612 - pid:60879 [D] threadless._cleanup:317 - fd#30 unregistered by work#30
2021-12-29 17:48:23,612 - pid:60879 [D] threadless._cleanup:317 - fd#34 unregistered by work#30
2021-12-29 17:48:23,612 - pid:60879 [I] server.access_log:417 - 127.0.0.1:49386 - CONNECT httpbin.org:443 - 6022 bytes - 1293.04ms
2021-12-29 17:48:23,612 - pid:60879 [D] pool.release:124 - Retaining connection#4388713184 to upstream httpbin.org:443
2021-12-29 17:48:23,612 - pid:60879 [D] handler.shutdown:125 - Closing client connection 127.0.0.1:49386 has buffer False
2021-12-29 17:48:23,612 - pid:60879 [D] handler.shutdown:135 - Client connection shutdown successful
2021-12-29 17:48:23,613 - pid:60879 [D] handler.shutdown:149 - Client connection closed

Server Side logs for the 2nd request

2021-12-29 17:48:30,718 - pid:60879 [D] acceptor.accept:119 - Accepting new work#30
2021-12-29 17:48:30,733 - pid:60879 [D] tcp_server.__init__:53 - Work#30 accepted from 127.0.0.1:49394
2021-12-29 17:48:30,734 - pid:60879 [D] handler.initialize:106 - Handling connection 127.0.0.1:49394
2021-12-29 17:48:30,734 - pid:60879 [D] threadless._update_work_events:210 - fd#30 registered for mask#1 by work#30
2021-12-29 17:48:30,734 - pid:60879 [D] handler.handle_readables:279 - Client is read ready, receiving...
2021-12-29 17:48:30,734 - pid:60879 [D] connection.recv:64 - received 36 bytes from client
2021-12-29 17:48:30,735 - pid:60879 [D] server.connect_upstream:598 - Connecting to upstream httpbin.org:443
2021-12-29 17:48:30,735 - pid:60879 [D] pool.acquire:95 - Reusing connection#4388713184 for upstream httpbin.org:443
2021-12-29 17:48:30,735 - pid:60879 [I] server.connect_upstream:628 - Reusing connection to upstream httpbin.org:443
2021-12-29 17:48:30,735 - pid:60879 [D] threadless._update_work_events:180 - fd#30 modified for mask#3 by work#30
2021-12-29 17:48:30,735 - pid:60879 [D] threadless._update_work_events:194 - fd#34 borrowed with mask#1 by work#30
2021-12-29 17:48:30,736 - pid:60879 [D] handler.handle_writables:249 - Client is write ready, flushing...
2021-12-29 17:48:30,736 - pid:60879 [D] tcp_server.handle_writables:99 - Flushing buffer to client 127.0.0.1:49394
2021-12-29 17:48:30,736 - pid:60879 [D] connection.flush:96 - flushed 39 bytes to client
2021-12-29 17:48:30,736 - pid:60879 [D] threadless._update_work_events:180 - fd#30 modified for mask#1 by work#30
2021-12-29 17:48:30,745 - pid:60879 [D] handler.handle_readables:279 - Client is read ready, receiving...
2021-12-29 17:48:30,745 - pid:60879 [D] connection.recv:64 - received 517 bytes from client
2021-12-29 17:48:30,745 - pid:60879 [D] threadless._update_work_events:180 - fd#34 modified for mask#3 by work#30
2021-12-29 17:48:30,745 - pid:60879 [D] server.write_to_descriptors:220 - Server is write ready, flushing...
2021-12-29 17:48:30,745 - pid:60879 [D] connection.flush:96 - flushed 517 bytes to server
2021-12-29 17:48:30,746 - pid:60879 [D] threadless._update_work_events:180 - fd#34 modified for mask#1 by work#30
2021-12-29 17:48:31,083 - pid:60879 [D] server.read_from_descriptors:257 - Server is read ready, receiving...
2021-12-29 17:48:31,083 - pid:60879 [D] connection.recv:64 - received 31 bytes from server
2021-12-29 17:48:31,084 - pid:60879 [D] threadless._update_work_events:180 - fd#30 modified for mask#3 by work#30
2021-12-29 17:48:31,084 - pid:60879 [D] handler.handle_writables:249 - Client is write ready, flushing...
2021-12-29 17:48:31,084 - pid:60879 [D] tcp_server.handle_writables:99 - Flushing buffer to client 127.0.0.1:49394
2021-12-29 17:48:31,085 - pid:60879 [D] connection.flush:96 - flushed 31 bytes to client
2021-12-29 17:48:31,085 - pid:60879 [D] server.read_from_descriptors:257 - Server is read ready, receiving...
2021-12-29 17:48:31,085 - pid:60879 [D] server.read_from_descriptors:292 - Server closed connection, tearing down...
2021-12-29 17:48:31,085 - pid:60879 [D] pool.release:117 - Removing connection#4388713184 from pool from upstream httpbin.org:443
2021-12-29 17:48:31,085 - pid:60879 [D] pool._remove:165 - Removing conn#4388713184 from pool
2021-12-29 17:48:31,085 - pid:60879 [D] threadless._cleanup:317 - fd#30 unregistered by work#30
2021-12-29 17:48:31,086 - pid:60879 [D] threadless._cleanup:317 - fd#34 unregistered by work#30
2021-12-29 17:48:31,086 - pid:60879 [I] server.access_log:417 - 127.0.0.1:49394 - CONNECT None:None - 31 bytes - 352.14ms
2021-12-29 17:48:31,086 - pid:60879 [D] handler.shutdown:125 - Closing client connection 127.0.0.1:49394 has buffer False
2021-12-29 17:48:31,086 - pid:60879 [D] handler.shutdown:135 - Client connection shutdown successful
2021-12-29 17:48:31,086 - pid:60879 [D] handler.shutdown:149 - Client connection closed
abhinavsingh commented 2 years ago

Solution to the above problem is that, when requests module reaches handshake step:

self.do_handshake()

^^^ See requests exception log above to find this line

proxy.py must dispatch previously caches TLS session ticket. This is necessary because, reused HTTPS connection from the pool has already been SSL wrapped. But client will try to TLS wrap again (a.k.a TLS handshake), leading into protocol level exceptions.

abhinavsingh commented 2 years ago

So big question is, can proxy.py obtain the TLS ticket or handshake information without TLS interception?

@JerryKwan Can you try to parse initial few packets from client to server (and vice-versa) using your TlsProtocolParser class. Imagine, how Wireshark is able to help us visualize all the TLS protocol information without TLS interception. Similar, we might be able to parse the protocol bits just enough to extract out the SSL ticket (probably). Can you verify? Thank you!!!

abhinavsingh commented 2 years ago

To summarize, what we need is something like this:

Make a request to google.com, cache SSL session

openssl s_client -connect google.com:443 -reconnect -no_ticket -sess_out /tmp/google.session

Use SSL cache to make another request to google.com

openssl s_client -connect google.com:443 -reconnect -no_ticket -sess_in /tmp/google.session

View SSL Session parameters

cat /tmp/google.session

We need to achieve the same within proxy.py :)

noorul commented 1 year ago

Any plans to implement this for https?

JerryKwan commented 1 year ago

@abhinavsingh Any plans for https?

abhinavsingh commented 1 year ago

@noorul @JerryKwan Thank you for bring my attention to this. However, I am too occupied with my current assignment and hardly finding any bandwidth for proxy.py these days. I might not be able to return back to it for next month or so atleast.

To summarize, what we need is something like this:

Make a request to google.com, cache SSL session

openssl s_client -connect google.com:443 -reconnect -no_ticket -sess_out /tmp/google.session

Use SSL cache to make another request to google.com

openssl s_client -connect google.com:443 -reconnect -no_ticket -sess_in /tmp/google.session

View SSL Session parameters

cat /tmp/google.session

We need to achieve the same within proxy.py :)

Thanks to this summary, we have a clear path forward on how to go about it, but will require some efforts to make it production ready. If someone wants to take a stab at it, I will surely try to find some time for review/merge efforts.