Client.request bypassing urllib2.Request header normalization

aronatkins commented 11 years ago

I found this because we happen to have a server which barfs on Content-Type but accepts Content-type. Go figure.

When a header dictionary is passed to urllib2.Request, each key/value of that dictionary is not blindly added to the request headers dictionary. Instead, request.add_header is called on each pair and that performs key.capitalize().

By simply calling req.headers.update after creating the request, sanction skips this header normalization.

Both transport_headers and transport_query suffer from this.

It probably makes sense for Client.request to mirror urllib2.Request as much as possible since it's trying to just be a thin wrapper. This is one inconsistency.

I am working around this issue by passing Content-type instead of Content-Type to Client.request.

I imagine that the default header normalization might cause problems as well (but, honestly, resource servers should be processing headers in a case insensitive way and it sucks that the one we have isn't). I can understand if you deem this not worth fixing. At the very least, I wanted to document the problem in case someone else runs into it.

aronatkins commented 10 years ago

A clarification.

The resource server does handle content-type in a case insensitive way. The problem is that urllib2 adds the header Content-type with a value of application/x-www-form-urlencoded if not present in the set of headers. In other words, there is code in urllib2 which expects headers to already be normalized a does lookup using those normalized header names.

See AbstractHTTPHandler. dorequest (do_open in older versions) in http://svn.python.org/projects/python/branches/release27-maint/Lib/urllib2.py

In other words, I was wrong about how my resource server was behaving. There are still unexpected effects from not using normalized request headers.

demianbrecht commented 10 years ago

Yeah, unfortunately there's definitely some magic that happens in various places in urllib. This behaviour is also consistent in 3.4. I guess I can kind of see the reasoning behind providing a sane default, but I don't like the assumption that if request.data is not None, it assumes POST and automagically sets Content-type to application/x-www-form-urlencoded. Whatever happened to "explicit is better than implicit"? ;)

Unfortunately, I don't think there's anything on this end that I'm willing to do in order to prevent this from happening. The only thing that I can think of changing is to provide alternative defaults before calling into urlopen. If you feel strongly enough about it, you could always open a bug at bugs.python.org and suggest removing that assumption. I figure that this will likely not be accepted though due to horribly breaking backwards compatibility (I'd venture to guess that there is a lot of code that relies on this).

I'm going to close this as I don't think there's anything that can be done from this end. However, it's definitely good to have logged should anyone else encounter this problem, so thanks for reporting it!

aronatkins commented 10 years ago

Since the request object is manipulating the user-supplied headers in its constructor, maybe we should consider Request.headers more like private data of the Request object.

What about calling Request.add_header for each key/value in the user-supplied headers? transport_headers would become something like:

def transport_headers(url, access_token, data=None, method=None, headers={}):
    try:
        req = Request(url, data=data, method=method)
    except TypeError:
        req = Request(url, data=data)
        req.get_method = lambda: method

    if headers is not None:
        for (name,value) in headers.items() :
            req.add_header(name,value)
    req.add_header('Authorization', 'Bearer {}'.format(access_token))
    return req

Alternatively, you could pass the set of headers into the request:

def transport_headers(url, access_token, data=None, method=None, headers={}):
    all_headers = {}
    all_headers.update(headers)
    all_headers['Authorization'] = 'Bearer {}'.format(access_token)

    try:
        req = Request(url, data=data, method=method, headers=all_headers)
    except TypeError:
        req = Request(url, data=data, headers=all_headers)
        req.get_method = lambda: method
    return req

The final option is to not accept headers at all and ask that callers manipulate the Request object themselves (by wrapping transport_* or something).

I wonder if one of these options is the "least surprising" to users of sanction.

demianbrecht commented 10 years ago

Here are my thoughts on the two issues raised here:

Using add_header or Request(headers={})

IMHO, the behaviour in Request.add_header itself is surprising. The Request.add_header method doesn't conform to expectations I would have after perusing RFC 2616, which would be that the header would actually be "Content-Type", not "Content-type". To compound the issue, there's nothing AFAIK outside of this draft RFC that actually defines the case sensitivity of header field names (here, they're defined as being case insensitive). If I remember correctly, I explicitly went around using add_header for this reason: Request.add_header actually does something unexpected that isn't actually defined anywhere in calling .capitalize(). IMO, this defies the "consenting adults" philosophy (again, IMO, what you put into the headers dict should be what gets sent over the wire) and really should be fixed in the stdlib. Having said that, it's a long-outstanding piece of code and not likely something that will be changed due to backwards compatibility.

Default Content-Type header (application/x-www-form-urlencoded)

This is the behaviour of the underlying Request object when it detects a non-GET request. It's at least a little surprising, but somewhat understandable. Nothing to be done here. (I don't think you were looking for anything to be done here anyway, right?)

aronatkins commented 10 years ago

I agree that headers should be insensitive. I agree that what is put into the header dict should be what is sent over the wire. I agree that add_header calling capitalize is unexpected. But it is what we have to work with. Even if the stdlib changes, we would still have bunches of old Python implementations that will be around for a long time, as you have said.

I am using the implementation of transport_headers as shown above as a workaround to this problem. I would prefer that sanction take this same approach, but respect your decision to try and bypass the normalization of add_header.

demianbrecht / sanction

Client.request bypassing urllib2.Request header normalization #10