Closed hansjorg closed 6 years ago
Thanks for reporting this.
It appears to be an issue coming in via urllib3
and what it's doing in the face of a server that's not actually doing HTTP. It's ending up passing back a status code of 200, misleading gabbi, which is none the wiser and trusts urllib3.
I tried the following against your nc
example:
>>> import urllib3
>>> http = urllib3.PoolManager()
>>> r = http.request('GET', 'http://localhost:8000/')
>>> r.status
200
If we turn on urllib3 debuggin for that, it seems to be an issue in httplib (which urllib3) is calling:
DEBUG:urllib3.connectionpool:Starting new HTTP connection (5): localhost
DEBUG:urllib3.connectionpool:http://localhost:8000 "GET / HTTP/1.1" 200 None
That's in the _make_request
method of HTTPConnectionPool
in urllib3/connectionpool.py
And that makes it look like it is a problem within httplib's HTTPConnection
class.
It's very late for me, so I need to not dig into this further tonight (although I'm very curious). If you have an opportunity to chase things down deeper into httplib, that woud be helpful.
I'm not sure how is best to proceed here. Gabbi makes the assumption that protocol layer is working and isn't really designed for testing that. But on the other hand, this situation has come up and is clearly very misleading. But on yet another hand, if the issue is deep in httplib, I'm not sure how to fix things, as errors in stdlib stay around for a long time.
How did you initially discover this? Presumably you weren't doing exactly as you describe above. Instead you ran into the problem and then came up with a minimal test case. Knowing the context may help suss out what to do.
In any case, thanks very much for the detailed report with an easy replication scenario.
/cc @FND simply because he'd likely find this both interesting and infuriating.
Indeed, very much so - rather incredible even. If I find the time I'll dig into this myself.
I wanted to test a proxy, so this is close to what I was actually doing.
Looking at the urllib3
code, this seems to be caused by support for HTTP 0.9 "simple responses" in httplib
. There is a strict
parameter to httplib.HTTPConnection
which when set will cause a BadStatusLine
to be raised for these cases.
The strict
parameter and support for "simple responses" was removed in version 3.4 of httplib
, so retesting this with a Python 3 virtualenv, I do get a BadStatusLine
exception for status lines like these.
Yeah strict
is the right clue. I should have remembered it from working on wsgi-intercept
.
We can set strict when creating the PoolManager in gabbi.httpclient. It accepts the same params as ConnectionPool:
https://urllib3.readthedocs.io/en/1.5/pools.html#api
diff --git a/gabbi/httpclient.py b/gabbi/httpclient.py
index 6310ef6..92fc342 100644
--- a/gabbi/httpclient.py
+++ b/gabbi/httpclient.py
@@ -182,5 +182,5 @@ def get_http(verbose=False, caption=''):
if verbose == 'headers':
body = False
return VerboseHttp(body=body, headers=headers, colorize=colorize,
- stream=stream, caption=caption)
- return Http()
+ stream=stream, caption=caption, strict=True)
+ return Http(strict=True)
Appears to get the desired results, and according to the urllib3 docs the strict param is ignored in environments where it isn't needed.
I'm on my way out the door, so if somebody else wants to make that a pull request, go for it, otherwise I'll do it tonight and make a new release.
Using a simple test like:
With server:
echo "HTTP/1.0 404 Not Found" | nc -l 8000
You'll get the expected error:
'404' not in ['200']
However, an invalid HTTP response like this will succeed:
echo "X/1.0 404 Not Found" | nc -l 8000
It seems like any response (even empty) other than an actual valid error status line will succeed when testing for status 200.
When testing for other status codes invalid responses aren't accepted, so it seems 200 is a special case.