gabrielfalcao / HTTPretty

Intercept HTTP requests at the Python socket level. Fakes the whole socket module
https://httpretty.readthedocs.org
MIT License
2.11k stars 276 forks source link

Semi-colon separators are tossed aside within query strings #134

Open rgbkrk opened 10 years ago

rgbkrk commented 10 years ago

Under the hood, httpretty is using urlparse.parse_qs to parse query strings.

This poses a problem when a semi-colon is used as a delimiter:

In [1]: import urlparse

In [2]: urlparse.parse_qs("tagged=python;ruby&site=stackoverflow")
Out[2]: {'site': ['stackoverflow'], 'tagged': ['python']}

This stems from semi-colons being hardcoded as a delimiter within urlparse like '&' (a W3C recommendation but not standard for webservers).

If the semi-colon is escaped however, this works just fine.

In [11]: urlparse.parse_qs("tagged=python%3Bruby&site=stackoverflow")
Out[11]: {'site': ['stackoverflow'], 'tagged': ['python;ruby']}

In reality though, unquote_utf8 will prevent this from passing through.

This all stems from trying to mock and test a module that is using the StackExchange API, which uses semi-colons as separators.

Test case using httpretty:

import requests
import httpretty

httpretty.enable()

httpretty.register_uri(httpretty.GET, "https://api.stackexchange.com/2.1/search", body='{"items":[]}')

resp = requests.get("https://api.stackexchange.com/2.1/search",
                    params={"tagged":"python;ruby",
                            "site": "stackoverflow"})

httpretty_request = httpretty.last_request()
print(httpretty_request.querystring)

httpretty.disable()
httpretty.reset()

Relevant issue on Python's own issue tracker.

rgbkrk commented 10 years ago

As I answered in this StackOverflow question, I temporarily monkey patched httpretty.core.unquote_utf8 (technically httpretty.compat.unquote_utf8).

#
# To get around how parse_qs works (urlparse, under the hood of
# httpretty), we'll leave the semi colon quoted.
# 
# See https://github.com/gabrielfalcao/HTTPretty/issues/134
orig_unquote = httpretty.core.unquote_utf8
httpretty.core.unquote_utf8 = (lambda x: x)

# It should handle tags as a list
httpretty.register_uri(httpretty.GET,
                       "https://api.stackexchange.com/2.1/search",
                       body=param_check_callback({'tagged': 'python;dog'}))
search_questions(since=since, tags=["python", "dog"], site="pets")

...

# Back to normal for the rest
httpretty.core.unquote_utf8 = orig_unquote
# Test the test by making sure this is back to normal
assert httpretty.core.unquote_utf8("%3B") == ";"

This assumes you don't need anything else unquoted. Another option is to only leave the semi-colons percent-encoded before it reaches parse_qs.