Closed jmooreoliva closed 10 years ago
One reason I set my sights on gruvi is that I had a lot of trouble finding an IOCP library that was compatible with SSL. Python3's asyncio supports IOCP, but not with SSL.
https://mail.python.org/pipermail/python-ideas/2012-November/017686.html
Reading through the bitbucket patch linked in the above link, It appears you are taking a similar approach by wrapping the socket object and then writing/reading etc. Based on my research before finding gruvi, it appears that IOCP is incompatible with SSLContext wrap_socket. (though I do not know why, I speculate it has something to do with different eventing semantics).
Were you aware that wrap_socket doesn't work with IOCP? Just wondering if you took any steps to do things differently, or if you were unaware at the incompatibility. (Or, even better, I am wrong about this and they aren't incompatible here with libuv for some reason).
Another link describing more what needs to be done to use BIO with OpenSSL if you had any inclination: http://grokbase.com/t/python/python-dev/1065rtfpfg/ssl
Hi!
yes, very much aware about IOCP and wrap_socket(). The way Gruvi uses the _ssl module is that it gets data from the network through libuv (which uses IOCP on Windows), and then passes on that data via a private socketpair to the SSL protocol instance. That socketpair isn't part of the event loop, so no select() on it is needed, just non-blocking read-writes.
The socket pair solution was always planned to be a temporary solution. The "real" solution is to create a memory BIO. I had planned to do that at some point. Maybe now is a good time to do it. (Note that the socketpair solution is pretty efficient, at least on Linux where there is a real socketpair() function. I get 200+ MB/sec of throughput. I'm not sure that a memory BIO would improve that as I think it is CPU bound on the crypto. )
Regarding this specific issue..
My suspicion is that somehow the socketpair on Windows (which isn't even a real socket pair, it's just a connected TCP socket) behaves in some weird way. Maybe the data I send into it is not made available immediately.
Regards, Geert
My testing methodology is below: I added two instance variables to class SslPipe from ssl.py
self._issue8_location = None
self._issue8_offset = None
and I modified the loop as such:
def feed_ssldata(self, data):
"""Feed SSL record level data into the pipe.
The data must be a bytes instance. It is OK to send an empty bytes
instance. This can be used to get ssldata for a handshake initiated by
this endpoint.
Return a (ssldata, appdata) tuple. The ssldata element is a list of
buffers containing SSL data that needs to be sent to the remote SSL.
The appdata element is a list of buffers containing plaintext data that
needs to be forwarded to the application. The appdata list may contain
an empty buffer indicating an SSL "close_notify" alert. This alert must
be acknowledged by calling :meth:`shutdown`.
"""
if self._sockets is None:
raise RuntimeError('pipe was closed')
if self._state == self.S_UNWRAPPED:
# If unwrapped, pass plaintext data straight through.
return ([], [data] if data else [])
view = compat.memoryview(data)
offset = 0
ssldata = []; appdata = []
while True:
self._need_ssldata = False
self._issue8_location = 'write_to_socket'
self._issue8_offset = offset
offset += write_to_socket(self._sockets[0], view[offset:])
self._issue8_location = None
self._issue8_offset = offset
try:
if self._state == self.S_DO_HANDSHAKE:
# Call do_handshake() until it doesn't raise anymore.
self._issue8_location = 'do_handshake'
self._issue8_offset = offset
self._sslobj.do_handshake()
self._issue8_location = None
self._issue8_offset = offset
self._state = self.S_WRAPPED
if self._on_handshake_complete:
self._on_handshake_complete()
if self._state == self.S_WRAPPED:
# Main state: read data from SSL until close_notify
while True:
chunk = self._sslobj.read(self.bufsize)
appdata.append(chunk)
if not chunk: # close_notify
break
if self._state == self.S_SHUTDOWN:
# Call shutdown() until it doesn't raise anymore.
self._sslobj.shutdown()
self._sslobj = None
self._state = self.S_UNWRAPPED
if self._on_handshake_complete:
self._on_handshake_complete()
if self._state == self.S_UNWRAPPED:
# Drain possible plaintext data after close_notify.
chunks = read_from_socket(self._sockets[1], self.bufsize)
appdata.extend(chunks)
except ssl.SSLError as e:
if e.errno not in (ssl.SSL_ERROR_WANT_READ, ssl.SSL_ERROR_WANT_WRITE):
raise
traceback.print_exc()
self._need_ssldata = e.errno == ssl.SSL_ERROR_WANT_READ
print(self._need_ssldata)
# Check for record level data that needs to be sent back.
# Happens for the initial handshake and renegotiations.
chunks = read_from_socket(self._sockets[0], self.bufsize)
ssldata.extend(chunks)
# We are done if we wrote all data.
if offset == len(view):
break
return (ssldata, appdata)
in endpoints.py create_connection around line 211 I modified it as follows:
if ssl:
context = ssl if hasattr(ssl, '_wrap_socket') else create_ssl_context()
transport = SslTransport(handle, context, False, **ssl_args)
else:
transport = Transport(handle, mode)
event = transport.start(protocol)
if event is not None:
try:
event.wait(timeout=5.0)
except Timeout:
print('==OUT OF WAIT WITH LOCATION %s offset %d' % (transport._sslpipe._issue8_location, transport._sslpipe._issue8_offset))
return (transport, protocol)
It's not clean, but it does the job without interfering with the timing of the loop in question.
The output of running is below. Note that I get the data correctly after the timeout, though I assume this is just because I have somehow stopped a block.
C:\py>C:\python34\python gruvicli.py
Traceback (most recent call last):
File "C:\python34\lib\site-packages\gruvi-0.9.3.dev-py3.4-win32.egg\gruvi\ssl
py", line 291, in feed_ssldata
self._sslobj.do_handshake()
ssl.SSLWantReadError: The operation did not complete (read) (_ssl.c:598)
True
Traceback (most recent call last):
File "C:\python34\lib\site-packages\gruvi-0.9.3.dev-py3.4-win32.egg\gruvi\ssl
py", line 291, in feed_ssldata
self._sslobj.do_handshake()
ssl.SSLWantReadError: The operation did not complete (read) (_ssl.c:598)
True
==OUT OF WAIT WITH LOCATION do_handshake offset 583
Traceback (most recent call last):
File "C:\python34\lib\site-packages\gruvi-0.9.3.dev-py3.4-win32.egg\gruvi\ssl
py", line 291, in feed_ssldata
self._sslobj.do_handshake()
ssl.SSLWantReadError: The operation did not complete (read) (_ssl.c:598)
True
Traceback (most recent call last):
File "C:\python34\lib\site-packages\gruvi-0.9.3.dev-py3.4-win32.egg\gruvi\ssl
py", line 300, in feed_ssldata
chunk = self._sslobj.read(self.bufsize)
ssl.SSLWantReadError: The operation did not complete (read) (_ssl.c:1767)
True
Traceback (most recent call last):
File "C:\python34\lib\site-packages\gruvi-0.9.3.dev-py3.4-win32.egg\gruvi\ssl
py", line 300, in feed_ssldata
chunk = self._sslobj.read(self.bufsize)
ssl.SSLWantReadError: The operation did not complete (read) (_ssl.c:1767)
True
b'Hello, world OMG!'
Traceback (most recent call last):
File "C:\python34\lib\site-packages\gruvi-0.9.3.dev-py3.4-win32.egg\gruvi\ssl
py", line 306, in feed_ssldata
self._sslobj.shutdown()
ssl.SSLWantReadError: The operation did not complete (read) (_ssl.c:1873)
True
I bet you this is what is happening: write_to_socket() writes the entire SSL handshake to the socketpair. However, for some reason, the data written is not immediately available to the SSL protocol. Therefore, do_handshake() raises an exception. But because offset == len(view) in line 313, the loop exits.
Can you confirm this? You can also log len(view), and make sure it is the same as offset.
The above explains why the timeout works: the code is not blocked, but instead running regularly in the event loop, and so the timeout can be serviced. The next time data is received (this time application data), the handshake completes, and the application data is handled.
Your hypothesis appears to be correct.
I am only noting the following for completeness: I am by no means trying to imply timing out and pumping data is a solution! On one attempt (the one that generated the data below) the code hung even after the timeout. It probably just got into a similar situation again with the pseudo socket pair. On a subsequent attempt, the code worked after the timeout.
Here is the output.
C:\py>C:\python34\python gruvicli.py
Traceback (most recent call last):
File "C:\python34\lib\site-packages\gruvi-0.9.3.dev-py3.4-win32.egg\gruvi\ssl.
py", line 296, in feed_ssldata
self._sslobj.do_handshake()
ssl.SSLWantReadError: The operation did not complete (read) (_ssl.c:598)
True
Traceback (most recent call last):
File "C:\python34\lib\site-packages\gruvi-0.9.3.dev-py3.4-win32.egg\gruvi\ssl.
py", line 296, in feed_ssldata
self._sslobj.do_handshake()
ssl.SSLWantReadError: The operation did not complete (read) (_ssl.c:598)
True
==OUT OF WAIT WITH location do_handshake == offset 583 == view len 583 == view_break True
Traceback (most recent call last):
File "C:\python34\lib\site-packages\gruvi-0.9.3.dev-py3.4-win32.egg\gruvi\ssl.
py", line 296, in feed_ssldata
self._sslobj.do_handshake()
ssl.SSLWantReadError: The operation did not complete (read) (_ssl.c:598)
True
The modified code is below:
ssl.py, class SslPipe
def feed_ssldata(self, data):
"""Feed SSL record level data into the pipe.
The data must be a bytes instance. It is OK to send an empty bytes
instance. This can be used to get ssldata for a handshake initiated by
this endpoint.
Return a (ssldata, appdata) tuple. The ssldata element is a list of
buffers containing SSL data that needs to be sent to the remote SSL.
The appdata element is a list of buffers containing plaintext data that
needs to be forwarded to the application. The appdata list may contain
an empty buffer indicating an SSL "close_notify" alert. This alert must
be acknowledged by calling :meth:`shutdown`.
"""
if self._sockets is None:
raise RuntimeError('pipe was closed')
if self._state == self.S_UNWRAPPED:
# If unwrapped, pass plaintext data straight through.
return ([], [data] if data else [])
view = compat.memoryview(data)
offset = 0
ssldata = []; appdata = []
while True:
self._need_ssldata = False
self._issue8_location = 'write_to_socket'
self._issue8_offset = offset
self._issue8_view_len = len(view)
offset += write_to_socket(self._sockets[0], view[offset:])
self._issue8_location = None
self._issue8_offset = offset
self._issue8_view_len = len(view)
try:
if self._state == self.S_DO_HANDSHAKE:
# Call do_handshake() until it doesn't raise anymore.
self._issue8_location = 'do_handshake'
self._issue8_offset = offset
self._issue8_view_len = len(view)
self._sslobj.do_handshake()
self._issue8_location = None
self._issue8_offset = offset
self._issue8_view_len = len(view)
self._state = self.S_WRAPPED
if self._on_handshake_complete:
self._on_handshake_complete()
if self._state == self.S_WRAPPED:
# Main state: read data from SSL until close_notify
while True:
chunk = self._sslobj.read(self.bufsize)
appdata.append(chunk)
if not chunk: # close_notify
break
if self._state == self.S_SHUTDOWN:
# Call shutdown() until it doesn't raise anymore.
self._sslobj.shutdown()
self._sslobj = None
self._state = self.S_UNWRAPPED
if self._on_handshake_complete:
self._on_handshake_complete()
if self._state == self.S_UNWRAPPED:
# Drain possible plaintext data after close_notify.
chunks = read_from_socket(self._sockets[1], self.bufsize)
appdata.extend(chunks)
except ssl.SSLError as e:
if e.errno not in (ssl.SSL_ERROR_WANT_READ, ssl.SSL_ERROR_WANT_WRITE):
raise
traceback.print_exc()
self._need_ssldata = e.errno == ssl.SSL_ERROR_WANT_READ
print(self._need_ssldata)
# Check for record level data that needs to be sent back.
# Happens for the initial handshake and renegotiations.
chunks = read_from_socket(self._sockets[0], self.bufsize)
ssldata.extend(chunks)
# We are done if we wrote all data.
if offset == len(view):
self._issue8_view_break = True
break
return (ssldata, appdata)
endpoints.py create_connection around line 211
if ssl:
context = ssl if hasattr(ssl, '_wrap_socket') else create_ssl_context()
transport = SslTransport(handle, context, False, **ssl_args)
else:
transport = Transport(handle, mode)
event = transport.start(protocol)
if event is not None:
try:
event.wait(timeout=5.0)
except Timeout:
print(
'==OUT OF WAIT WITH location %s == offset %d == view len %d == view_break %s' % (
transport._sslpipe._issue8_location,
transport._sslpipe._issue8_offset,
transport._sslpipe._issue8_view_len,
transport._sslpipe._issue8_view_break
)
)
return (transport, protocol)
I thought I would touch on this issue now that I have a better understanding of what is going on.
The way Gruvi uses the _ssl module is that it gets data from the network through libuv (which uses IOCP on Windows), and then passes on that data via a private socketpair to the SSL protocol instance. That socketpair isn't part of the event loop, so no select() on it is needed, just non-blocking read-writes
The socket pair solution was always planned to be a temporary solution. The "real" solution is to create a memory BIO. I had planned to do that at some point. Maybe now is a good time to do it. (Note that the socketpair solution is pretty efficient, at least on Linux where there is a real socketpair() function. I get 200+ MB/sec of throughput. I'm not sure that a memory BIO would improve that as I think it is CPU bound on the crypto. )
I just thought I'd mention I'd love it if I could target my code only for unix systems! Unfortunately, that is not the case for me at this point in time :(
One of the environments I need to support is an XP or windows embedded machine running on an amazingly slow disk (I believe even slower than what I am using for this test, but I cannot easily test the performance on those machines). So for me, I am IO, not CPU, bound. So if you were to create a memory BIO solution I am quite confident I would significantly benefit from it.
From my very superficial understanding of the code, you block the entire main loop while you are transmitting data locally within your socket_pair in some cases with the while True loop. On a machine with a very slow IO subsystem on windows, even if you kept trying (which would be the quick fix here, and one that should probably be made in the meantime if it is easy), that's going to block the entire main loop for a potentially non-negligible amount of time.
And of course, even if you yielded somehow during the socketpair transmission (which you can't due to IOCP and wrap_socket) the extra amount of time required to write the data twice is a significant downside if you are IO bound.
Given my understanding above, I would say a memory BIO solution is superior and desired :)
I just thought I'd mention if I could target my code only for unix systems I'd be real happy! Unfortunately, that is not the case for me at this point in time :(
Yes, I know what you are talking about... :(
From my very superficial understanding of the code, you block the entire main loop while you are transmitting data locally within your socket_pair.
That is not correct. See line 171 in ssl.py. It uses non-blocking IO. The IO will complete before returning to the event loop, but since it's all local there's no waiting or polling involved.
On a machine with a very slow IO subsystem on windows, even if you kept trying (which would be the quick fix here, and one that should probably be made in the meantime if it is easy), that's going to block the entire main loop for a potentially non-negligible amount of time.
Right. Retry would be a busy loop and we cannot do that. Also there is no way to know whether a. we've written all the data into the socketpair that SSL needs, but the socketpair hasn't made it available yet to SSL, or b. that some data hasn't been received yet. At the SslTransport level we work with binary data and have no idea of the SSL record level.
Did you check if the VM has some security software installed? It might be trying to scan the network connection.
Anyway, this all leads to the same conclusion: the current approach doesn't work well on Windows XP at least.
I'll look into doing a memory BIO. But it's a change to the Python stdlib _ssl module and it may take a while to land. I will leave this ticket open in the mean time.
@jmooreoliva would you mind testing out the patch below? It could be a workaround until the memory BIO is there.
The patch is "almost correct". In 2 out of 3 situations I know whether or not data should be available on the socketpair. In this case, calling select() when you know that data is coming is OK and should not have any negative impact. On OSs where socketpair() behaves nicely, the select() should return right away, and the overhead is only one extra system call per write().
In the third situation, namely when I write SSL data to the socket, I don't know if I will get SSL data back. This is because I don't know if the SSL data was encrypted application level data or handshake data. And even if I knew, the handshake data may or may not have response handshake data. The patch uses a small timeout (0.1 seconds) only during the initial handshake. This means that the initial handshake make take a little bit more time, and that renegotiations during normal operation (S_WRAPPED) do not work (these are rare/non-existent anyway).
If the patch doesn't work, try adjusting the 0.1 second timeout. This will however make SSL handshakes take longer.
The patch is mostly a POC to see if this solves your problem, while I'm working on a real solution.
diff --git a/gruvi/ssl.py b/gruvi/ssl.py
index 22037ef..7197df8 100644
--- a/gruvi/ssl.py
+++ b/gruvi/ssl.py
@@ -10,6 +10,7 @@ from __future__ import absolute_import, print_function
import errno
import socket
+import select
import io
import pyuv
import ssl
@@ -33,13 +34,13 @@ else:
__all__ = ['SslTransport', 'create_ssl_context', 'HAVE_SSL_BACKPORTS']
-def write_to_socket(sock, data):
+def write_to_socket(sockets, data):
"""Write as much of *data* to the socket as possible, retrying short writes
due to EINTR only."""
offset = 0
while offset != len(data):
try:
- nbytes = sock.send(data[offset:])
+ nbytes = sockets[0].send(data[offset:])
except (io.BlockingIOError, socket.error) as e:
if e.errno == errno.EINTR:
continue
@@ -47,12 +48,16 @@ def write_to_socket(sock, data):
break
raise
offset += nbytes
+ if offset:
+ select.select([sockets[1]], [], [], 1)
return offset
-def read_from_socket(sock, bufsize):
+def read_from_socket(sock, bufsize, wait=0):
"""Read as much data as possible from *sock*, using *bufsize* sized
chunks. The result is returned as a list of buffers."""
chunks = []
+ if wait:
+ select.select([sock], [], [], wait)
while True:
try:
chunk = sock.recv(bufsize)
@@ -237,7 +242,8 @@ class SslPipe(object):
ssldata = []; appdata = []
while True:
self._need_ssldata = False
- offset += write_to_socket(self._sockets[0], view[offset:])
+ offset += write_to_socket(self._sockets, view[offset:])
+ prev_state = self._state
try:
if self._state == self.S_DO_HANDSHAKE:
# Call do_handshake() until it doesn't raise anymore.
@@ -269,7 +275,8 @@ class SslPipe(object):
self._need_ssldata = e.errno == ssl.SSL_ERROR_WANT_READ
# Check for record level data that needs to be sent back.
# Happens for the initial handshake and renegotiations.
- chunks = read_from_socket(self._sockets[0], self.bufsize)
+ wait = 0 if prev_state == self.S_WRAPPED else 0.1
+ chunks = read_from_socket(self._sockets[0], self.bufsize, wait)
ssldata.extend(chunks)
# We are done if we wrote all data.
if offset == len(view):
@@ -297,10 +304,12 @@ class SslPipe(object):
ssldata = []
view = compat.memoryview(data)
while True:
+ nbytes = 0
self._need_ssldata = False
try:
if offset < len(view):
- offset += self._sslobj.write(view[offset:])
+ nbytes = self._sslobj.write(view[offset:])
+ offset += nbytes
except ssl.SSLError as e:
# It is not allowed to call write() after unwrap() until the
# close_notify is acknowledged. We return the condition to the
@@ -311,7 +320,7 @@ class SslPipe(object):
raise
self._need_ssldata = e.errno == ssl.SSL_ERROR_WANT_READ
# See if there's any record level data back for us.
- chunks = read_from_socket(self._sockets[0], self.bufsize)
+ chunks = read_from_socket(self._sockets[0], self.bufsize, 1 if nbytes else 0)
ssldata.extend(chunks)
if offset >= len(view) or self._need_ssldata:
break
My email to python-dev:
https://mail.python.org/pipermail/python-dev/2014-July/135319.html
Just letting you know that I should get around to trying that patch today or tomorrow.
I'm very interested to watch the ssl memory bio progress, thanks for the link.
(ignore previous comment that was deleted: had some issues setting up my test environment again).
As a note: Don't rush a non-BIO fix out for me, I have hacked together a small fibers overlay on top of twisted that serves my needs for now. I will continue to watch this library out of my personal interest however :)
I "manually" applied the patch above keeping my diagnostic code in there.
That appears to have worked: I can now contact www.google.com on port 443 and download the page with a GET /. Before this patch I could not.
Patch submitted to Python upstream:
Fixed in f354b11ab77677a3a59ef7f2958a1c931ce3a529.
Upstream contains the required functionality in the latest Python 3.5 alpha. The above changeset includes a backport for Python 2.7, 3.3 and 3.4 as well. To make it work on Windows you need to compile the _sslcompat module. We should add a wiki page with complication instructions on Windows, including what OpenSSL version to use for what Python version.
System: Windows XP SP3 Python version: python 3.4.1 32 bit Gruvi: 0.9.3 git cloned today around 6pm MST.
I have two VM's with Windows XP SP3. One VM has a very slow disk (~10x slower than a normal desktop machine), the other does not. The one with the slow disk subsystem exhibits the following undesired behaviour: Any SSL connection, whether a gruvi program acting as a https client connecting to a gruvi server, or server (being connected to by a urllib3 (non gruvi) client OR chrome or internet explorer 8) will get stuck during SSL handshake. Note that the example I use in the rest of this issue is with the client making an outbound https connection (code is at the end of this issue).
I traced the problem down to the file ssl.py. If I insert a print statement ~ line 281 (in my backtraces it is line 282 since I import traceback at the top of the file) the problem goes away.
I added a traceback.print_exc to the exception surrounding this code.
If I do NOT have print('SLOW DOWN'), I will get two rapid SSL_ERROR_WANT_READ printed and then the connection hangs.
The following is the output of my client (gruvicli.py) test program without print('SLOW DOWN'). It does NOT work. I have to Ctrl+C the program for any more input to be generated.
The following is the output of my client (gruvicli.py) test program with print('SLOW DOWN'). It works here. Notice the message: b'Hello, world OMG!' appearing, this is the data it receives from the webserver it contacts.
Below please find the source code for my test client (gruvicli.py) and server (gruviws.py).
gruvicli.py
gruviws.py