betamaxpy / betamax

A VCR imitation designed only for python-requests.
https://betamax.readthedocs.io/en/latest/
Other
567 stars 62 forks source link

QueryMatcher comparison fails on high Unicode characters #43

Closed query closed 8 years ago

query commented 9 years ago

The following code will cause Betamax to raise a "request was made that could not be handled" error on Python 2.7:

from betamax import Betamax
from requests import Session

url = 'http://www.example.com/'
params = {'test': u'\u2603'}

session = Session()
for _ in xrange(2):
    with Betamax(session, cassette_library_dir='.').use_cassette(
            'test', match_requests_on=['uri']) as vcr:
        session.get(url, params=params)

This seems to be because Betamax is receiving Unicode strings from JSON deserialization, which it eventually passes to parse_qs. parse_qs isn't Unicode-aware and will happily return "broken" Unicode string values of the form u'\xe2\x98\x83', causing the later equality comparison to fail.

This may be moot given #27, but it's worth noting.

sigmavirus24 commented 9 years ago

So if I run this, the cassette I get is:

{
    "http_interactions": [
        {
            "recorded_at": "2014-10-04T13:32:21",
            "request": {
                "body": {
                    "encoding": "utf-8",
                    "string": ""
                },
                "headers": {
                    "Accept": [
                        "*/*"
                    ],
                    "Accept-Encoding": [
                        "gzip, deflate"
                    ],
                    "Connection": [
                        "keep-alive"
                    ],
                    "User-Agent": [
                        "python-requests/2.4.1 CPython/2.7.5 Darwin/13.3.0"
                    ]
                },
                "method": "GET",
                "uri": "http://www.example.com/?test=%E2%98%83"
            },
            "response": {
                "body": {
                    "encoding": "ISO-8859-1",
                    "string": "<!doctype html>\n<html>\n<head>\n    <title>Example Domain</title>\n\n    <meta charset=\"utf-8\" />\n    <meta http-equiv=\"Content-type\" content=\"text/html; charset=utf-8\" />\n    <meta name=\"viewport\" content=\"width=device-width, initial-scale=1\" />\n    <style type=\"text/css\">\n    body {\n        background-color: #f0f0f2;\n        margin: 0;\n        padding: 0;\n        font-family: \"Open Sans\", \"Helvetica Neue\", Helvetica, Arial, sans-serif;\n        \n    }\n    div {\n        width: 600px;\n        margin: 5em auto;\n        padding: 50px;\n        background-color: #fff;\n        border-radius: 1em;\n    }\n    a:link, a:visited {\n        color: #38488f;\n        text-decoration: none;\n    }\n    @media (max-width: 700px) {\n        body {\n            background-color: #fff;\n        }\n        div {\n            width: auto;\n            margin: 0 auto;\n            border-radius: 0;\n            padding: 1em;\n        }\n    }\n    </style>    \n</head>\n\n<body>\n<div>\n    <h1>Example Domain</h1>\n    <p>This domain is established to be used for illustrative examples in documents. You may use this\n    domain in examples without prior coordination or asking for permission.</p>\n    <p><a href=\"http://www.iana.org/domains/example\">More information...</a></p>\n</div>\n</body>\n</html>\n"
                },
                "headers": {
                    "accept-ranges": [
                        "bytes"
                    ],
                    "cache-control": [
                        "max-age=604800"
                    ],
                    "content-length": [
                        "1270"
                    ],
                    "content-type": [
                        "text/html"
                    ],
                    "date": [
                        "Sat, 04 Oct 2014 13:32:20 GMT"
                    ],
                    "etag": [
                        "\"359670651\""
                    ],
                    "expires": [
                        "Sat, 11 Oct 2014 13:32:20 GMT"
                    ],
                    "last-modified": [
                        "Fri, 09 Aug 2013 23:54:35 GMT"
                    ],
                    "server": [
                        "ECS (mdw/1275)"
                    ],
                    "x-cache": [
                        "HIT"
                    ],
                    "x-ec-custom-error": [
                        "1"
                    ]
                },
                "status": {
                    "code": 200,
                    "message": "OK"
                },
                "url": "http://www.example.com/?test=%E2%98%83"
            }
        }
    ],
    "recorded_with": "betamax/0.4.1"
}

And if I use plain requests, I get this:

>>> import requests
>>> url = 'http://www.example.com/'
>>> params = {'test': u'\u2603'}
>>> s = requests.session()
>>> r = s.get(url, params=params)
>>> r.url
u'http://www.example.com/?test=%E2%98%83'
>>> r.request.url
'http://www.example.com/?test=%E2%98%83'

This makes me fairly certain this is not an issue stemming from Betamax. I'm going to have to think about how we can make this not break here though.