cdent / gabbi

Declarative HTTP Testing for Python and anything else
http://gabbi.readthedocs.org/
Other
148 stars 34 forks source link

Test for status code 200 succeeds for invalid HTTP responses #239

Closed hansjorg closed 6 years ago

hansjorg commented 6 years ago

Using a simple test like:

tests:
   - name: retrieve root
     GET: /
     status: 200

With server:

echo "HTTP/1.0 404 Not Found" | nc -l 8000

You'll get the expected error:

'404' not in ['200']

However, an invalid HTTP response like this will succeed:

echo "X/1.0 404 Not Found" | nc -l 8000

It seems like any response (even empty) other than an actual valid error status line will succeed when testing for status 200.

When testing for other status codes invalid responses aren't accepted, so it seems 200 is a special case.

cdent commented 6 years ago

Thanks for reporting this.

It appears to be an issue coming in via urllib3 and what it's doing in the face of a server that's not actually doing HTTP. It's ending up passing back a status code of 200, misleading gabbi, which is none the wiser and trusts urllib3.

I tried the following against your nc example:

>>> import urllib3
>>> http = urllib3.PoolManager()
>>> r = http.request('GET', 'http://localhost:8000/')
>>> r.status
200

If we turn on urllib3 debuggin for that, it seems to be an issue in httplib (which urllib3) is calling:

DEBUG:urllib3.connectionpool:Starting new HTTP connection (5): localhost
DEBUG:urllib3.connectionpool:http://localhost:8000 "GET / HTTP/1.1" 200 None

That's in the _make_request method of HTTPConnectionPool in urllib3/connectionpool.py

And that makes it look like it is a problem within httplib's HTTPConnection class.

It's very late for me, so I need to not dig into this further tonight (although I'm very curious). If you have an opportunity to chase things down deeper into httplib, that woud be helpful.

I'm not sure how is best to proceed here. Gabbi makes the assumption that protocol layer is working and isn't really designed for testing that. But on the other hand, this situation has come up and is clearly very misleading. But on yet another hand, if the issue is deep in httplib, I'm not sure how to fix things, as errors in stdlib stay around for a long time.

How did you initially discover this? Presumably you weren't doing exactly as you describe above. Instead you ran into the problem and then came up with a minimal test case. Knowing the context may help suss out what to do.

In any case, thanks very much for the detailed report with an easy replication scenario.

/cc @FND simply because he'd likely find this both interesting and infuriating.

FND commented 6 years ago

Indeed, very much so - rather incredible even. If I find the time I'll dig into this myself.

hansjorg commented 6 years ago

I wanted to test a proxy, so this is close to what I was actually doing.

Looking at the urllib3 code, this seems to be caused by support for HTTP 0.9 "simple responses" in httplib. There is a strict parameter to httplib.HTTPConnection which when set will cause a BadStatusLine to be raised for these cases.

The strict parameter and support for "simple responses" was removed in version 3.4 of httplib, so retesting this with a Python 3 virtualenv, I do get a BadStatusLine exception for status lines like these.

cdent commented 6 years ago

Yeah strict is the right clue. I should have remembered it from working on wsgi-intercept.

We can set strict when creating the PoolManager in gabbi.httpclient. It accepts the same params as ConnectionPool:

https://urllib3.readthedocs.io/en/1.5/pools.html#api

diff --git a/gabbi/httpclient.py b/gabbi/httpclient.py
index 6310ef6..92fc342 100644
--- a/gabbi/httpclient.py
+++ b/gabbi/httpclient.py
@@ -182,5 +182,5 @@ def get_http(verbose=False, caption=''):
         if verbose == 'headers':
             body = False
         return VerboseHttp(body=body, headers=headers, colorize=colorize,
-                           stream=stream, caption=caption)
-    return Http()
+                           stream=stream, caption=caption, strict=True)
+    return Http(strict=True)

Appears to get the desired results, and according to the urllib3 docs the strict param is ignored in environments where it isn't needed.

I'm on my way out the door, so if somebody else wants to make that a pull request, go for it, otherwise I'll do it tonight and make a new release.