csernazs / pytest-httpserver

Http server for pytest to test http clients
MIT License
219 stars 30 forks source link

Inconsistent behaviour in query string matching with spaces #370

Open WillMorrisonEnspi opened 1 month ago

WillMorrisonEnspi commented 1 month ago

It seems like the type (string vs dict) of query_string affects the behaviour of the resulting matcher when the query string in the request has percent-encoding. Specifically, a string typed argument to query_string must match exactly the query string in the request. However, a dict typed argument to query_string must match the request's query string after unquoting.

Reproduced with pytest-httpserver version 1.1.0

test file:

import unittest
import pytest_httpserver
import requests

class TestRepro(unittest.TestCase):

    def test_query_string_with_spaces_string_fails(self):
        with pytest_httpserver.HTTPServer() as httpserver:
            httpserver.expect_request("/test", query_string="foo=bar baz").respond_with_data("OK")

            url = httpserver.url_for("/test") + "?foo=bar baz"
            requests.get(url)
            httpserver.check_assertions()

    def test_query_string_is_encoded_string_passes(self):
        with pytest_httpserver.HTTPServer() as httpserver:
            httpserver.expect_request("/test", query_string="foo=bar%20baz").respond_with_data("OK")

            url = httpserver.url_for("/test") + "?foo=bar baz"
            requests.get(url)
            httpserver.check_assertions()

    def test_query_string_with_spaces_dict_passes(self):
        with pytest_httpserver.HTTPServer() as httpserver:
            httpserver.expect_request("/test", query_string={"foo": "bar baz"}).respond_with_data("OK")

            url = httpserver.url_for("/test") + "?foo=bar baz"
            requests.get(url)
            httpserver.check_assertions()

    def test_query_string_is_encoded_dict_fails(self):
        with pytest_httpserver.HTTPServer() as httpserver:
            httpserver.expect_request("/test", query_string={"foo": "bar%20baz"}).respond_with_data("OK")

            url = httpserver.url_for("/test") + "?foo=bar baz"
            requests.get(url)
            httpserver.check_assertions()

Output:

$ python -m unittest repro.py
127.0.0.1 - - [02/Oct/2024 16:36:41] "GET /test?foo=bar%20baz HTTP/1.1" 500 -
F127.0.0.1 - - [02/Oct/2024 16:36:41] "GET /test?foo=bar%20baz HTTP/1.1" 200 -
.127.0.0.1 - - [02/Oct/2024 16:36:41] "GET /test?foo=bar%20baz HTTP/1.1" 200 -
.127.0.0.1 - - [02/Oct/2024 16:36:41] "GET /test?foo=bar%20baz HTTP/1.1" 500 -
F
======================================================================
FAIL: test_query_string_is_encoded_dict_fails (repro.TestRepro.test_query_string_is_encoded_dict_fails)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/wmorrison/code/aws_lambda/repro.py", line 37, in test_query_string_is_encoded_dict_fails
    httpserver.check_assertions()
  File "/home/wmorrison/.local/share/pyenv/versions/3.11.5/envs/3.11.5-enspired@aws_lambda/lib/python3.11/site-packages/pytest_httpserver/httpserver.py", line 818, in check_assertions
    raise AssertionError(assertion)
AssertionError: No handler found for request <Request 'http://localhost:38771/test?foo=bar%20baz' [GET]> with data b''.Ordered matchers:
    none

Oneshot matchers:
    none

Persistent matchers:
    <RequestMatcher uri='/test' method='__ALL' query_string={'foo': 'bar%20baz'} headers={} data=None json=<UNDEFINED>>

======================================================================
FAIL: test_query_string_with_spaces_string_fails (repro.TestRepro.test_query_string_with_spaces_string_fails)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/wmorrison/code/aws_lambda/repro.py", line 13, in test_query_string_with_spaces_string_fails
    httpserver.check_assertions()
  File "/home/wmorrison/.local/share/pyenv/versions/3.11.5/envs/3.11.5-enspired@aws_lambda/lib/python3.11/site-packages/pytest_httpserver/httpserver.py", line 818, in check_assertions
    raise AssertionError(assertion)
AssertionError: No handler found for request <Request 'http://localhost:35185/test?foo=bar%20baz' [GET]> with data b''.Ordered matchers:
    none

Oneshot matchers:
    none

Persistent matchers:
    <RequestMatcher uri='/test' method='__ALL' query_string='foo=bar baz' headers={} data=None json=<UNDEFINED>>

----------------------------------------------------------------------
Ran 4 tests in 0.014s

FAILED (failures=2)
csernazs commented 1 month ago

hi @WillMorrisonEnspi ,

I think this behavior is consistent. I may misunderstood your issue, but here's my view:

This behaviour is partially documented (and unfortunatelly missing from the built html doc, that will be fixed):

String matching: https://github.com/csernazs/pytest-httpserver/blob/master/pytest_httpserver/httpserver.py#L188

Mapping matching: https://github.com/csernazs/pytest-httpserver/blob/master/pytest_httpserver/httpserver.py#L188

Also, there's a howto about this: https://pytest-httpserver.readthedocs.io/en/latest/howto.html#matching-query-parameters

Here, you specified an already quoted value in the dict, so it got double-quoted in pytest-httpserver:

    httpserver.expect_request("/test", query_string={"foo": "bar%20baz"}).respond_with_data("OK")

Here, you specified an unquoted string, which then did't match with the quoted one sent by requests as server compares it with ==.

    httpserver.expect_request("/test", query_string="foo=bar baz").respond_with_data("OK")

pytest-httpserver provides you an API to define your own query matcher, so the first failed case can be solved by this:

class MyQueryStringMatcher(QueryMatcher):
    def __init__(self, expected_string: str):
        parsed = urllib.parse.parse_qsl(expected_string)  # Parse query string into key-value pairs
        self._encoded_query_string = urllib.parse.urlencode(parsed, quote_via=urllib.parse.quote)

    def get_comparing_values(self, request_query_string: bytes) -> tuple:
        return (self._encoded_query_string, request_query_string.decode("utf-8"))

def test_query_string_with_spaces_string_fails(httpserver: HTTPServer):
    httpserver.expect_request("/test", query_string=MyQueryStringMatcher("foo=bar baz")).respond_with_data("OK")

    url = httpserver.url_for("/test") + "?foo=bar baz"
    requests.get(url)
    httpserver.check_assertions()

The other case can be solved by this:


class QuotedDictMatcher(MappingQueryMatcher):
    def __init__(self, query_dict: dict[str, str]):
        unquoted_dict = {k: urllib.parse.unquote(v) for k, v in query_dict.items()}
        super().__init__(unquoted_dict)

def test_query_string_is_encoded_dict_fails(httpserver: HTTPServer):
    httpserver.expect_request("/test", query_string=QuotedDictMatcher({"foo": "bar%20baz"})).respond_with_data("OK")

    url = httpserver.url_for("/test") + "?foo=bar baz"
    requests.get(url)
    httpserver.check_assertions()

Changing querystring matcher in pytest-httpserver to handle your case would be an API breaking, and it would break the current cases.

Zsolt

WillMorrisonEnspi commented 1 month ago
  • when you specify string, pytest-httpserver expects a quoted string so that will be compared with == after encoding.
  • when you specify a dict or MultiDict, then this will be quoted by pytest-httpserver so then it will be compared.

This is exactly the part I find inconsistent. Strings must already be quoted, while dict values should not already be quoted.

This behaviour is partially documented

I did read through the docs, but this was not explained there. The String matcher documents what happens when you pass string vs bytes, and says that the bytes must match the incoming request exactly (that's correct and matches the behaviour from my reproduction test). However, the Mapping matcher docs don't mention anything about quoting. I would read this as value handling should behave identically to the String matcher", but it doesn't as there is an additional percent-decoding of the query string introduced by the parse_qsl call.

I was also thrown off by the fact that the assertion messages from the Mapping matcher don't show the actual level of percent decoding happening. Compare: AssertionError: No handler found for request <Request 'http://localhost:38771/test?foo=bar%20baz' [GET]> with data b'' and the registered matcher <RequestMatcher uri='/test' method='__ALL' query_string={'foo': 'bar%20baz'} headers={} data=None json=<UNDEFINED>>

Perhaps the Mapping matcher's __repr__ should be updated to show the percent-decoded values if the request is also being percent decoded before comparison.

Changing querystring matcher in pytest-httpserver to handle your case would be an API breaking, and it would break the current cases.

This is a bit tricky to handle, I agree. Options below.

  1. If the API was changed so that the Mapping matched didn't percent decode the request values, then existing calls could break. bad, breaking change
  2. If the API was changed so that the Mapping matcher accepted either percent-encoded or not encoded values matching the query string in the request, then the breakage would be more subtle. Existing tests wouldn't start failing, but would start accepting more. There would be no way to check for "exactly this level of percent-encoding, but independent of order"[^1]. bad, reduced functionality
  3. Adding a flag/keyword argument to the function to choose between percent-decoding the query string or not shouldn't break any use cases. OK, but makes the API messier
  4. Just documenting the behaviour around percent encoding and making that part of the API. Valid, but locks in the inconsistency

[^1]: This is already sort of the case. A Mapping matcher value with no percent encoding in it will match requests with either one or no levels of encoding. E.g. {"foo": "bar baz"} will match a request that is either ?foo=bar%20baz or one that is ?foo=bar baz.

csernazs commented 1 month ago
  • when you specify string, pytest-httpserver expects a quoted string so that will be compared with == after encoding.
  • when you specify a dict or MultiDict, then this will be quoted by pytest-httpserver so then it will be compared.

This is exactly the part I find inconsistent. Strings must already be quoted, while dict values should not already be quoted.

The main idea behind this is that pytest-httpserver cannot quote this:

?a=foo bar&b=qqq&c=123

I also think that interpreting the & is ambiguous as it is not telling whether the & and others should be quoted or not.

One might says it is:

?a=foo%20bar&b=qqq&c=123

Other might say:

?a=foo%20bar%26b%3Dqqq%26c%3D123

You can say that & and = are not quoted, though. But then how one can specify & as a value? If they specify it with quoted, then that value will be un-quoted.

This behaviour is partially documented

I did read through the docs, but this was not explained there. The String matcher documents what happens when you pass string vs bytes, and says that the bytes must match the incoming request exactly (that's correct and matches the behaviour from my reproduction test). However, the Mapping matcher docs don't mention anything about quoting. I would read this as value handling should behave identically to the String matcher", but it doesn't as there is an additional percent-decoding of the query string introduced by the parse_qsl call.

Yes, I agree document needs to be extended with quoting details.

I was also thrown off by the fact that the assertion messages from the Mapping matcher don't show the actual level of percent decoding happening. Compare: AssertionError: No handler found for request <Request 'http://localhost:38771/test?foo=bar%20baz' [GET]> with data b'' and the registered matcher <RequestMatcher uri='/test' method='__ALL' query_string={'foo': 'bar%20baz'} headers={} data=None json=<UNDEFINED>>

Perhaps the Mapping matcher's __repr__ should be updated to show the percent-decoded values if the request is also being percent decoded before comparison.

Thats a good idea.

Changing querystring matcher in pytest-httpserver to handle your case would be an API breaking, and it would break the current cases.

This is a bit tricky to handle, I agree. Options below.

1. If the API was changed so that the Mapping matched didn't percent decode the request values, then existing calls could break. **bad, breaking change**

2. If the API was changed so that the Mapping matcher accepted either percent-encoded or not encoded values matching the query string in the request, then the breakage would be more subtle. Existing tests wouldn't start failing, but would start accepting more. There would be no way to check for "exactly this level of percent-encoding, but independent of order"[1](#user-content-fn-1-1825cdf4f80189722bbb03e6ae16fc5a). **bad, reduced functionality**

If existing tests provide % in their values, then those will start to fail. If you have a test with {"foo": "%20bar"} then it will fail as %20bar will be handled differently.

3. Adding a flag/keyword argument to the function to choose between percent-decoding the query string or not shouldn't break any use cases. **OK, but makes the API messier**

I think this can be done.

The matcher interface is designed for this, to provide more flexibility. I agree writing QuotedDictMatcher({"foo": "bar%20baz"}) makes the API less human-friendly, but IMHO this would be the way. MappingQueryMatcher could also have an optional kwarg flag, specifying the rules for quoting.

Alternatively there would be a always_quote parameter, having 3 possible values:

  1. is the current behavior
  2. pytest-httpserver never quotes anything passed-in
  3. pytest-httpserver always quote the strings passed-in (but my first question about ambiguity still be valid)
4. Just documenting the behaviour around percent encoding and making that part of the API. **Valid, but locks in the inconsistency**

That would be the easiest for sure, but the question is how this can be implemented in a way that:

  1. keep API and behavior stable: we won't break user's tests
  2. makes the new feature to be human friendly
  3. the new API introduced helps you and your case (I don't want to add an API which does not).

Footnotes

1. This is already sort of the case. A Mapping matcher value with no percent encoding in it will match requests with either one or no levels of encoding. E.g. `{"foo": "bar baz"}` will match a request that is either `?foo=bar%20baz` or one that is `?foo=bar baz`. [↩](#user-content-fnref-1-1825cdf4f80189722bbb03e6ae16fc5a)

Welp, this also introduces ambiguity, if one wants to check if their client properly quotes the query parameters, then this won't check that.

Zsolt

WillMorrisonEnspi commented 1 month ago

Summarizing the conversation so far.

  1. I think we agree that the behaviour of the StringQueryMatcher should not change, for the reasons you mentioned around ambiguity about & and = characters that would otherwise arise.
  2. We agree that the documentation for MappingQueryMatcher should be updated to mention the percent-decoding that happens to the request's query string before matching, and that this means that the mapping's values should not be percent encoded.
  3. We agreed that the __repr__ of the RequestMatcher (or the AssertionError message) should be updated to make it clearer why an unhandled request was not matched.
  4. There is a possiblity of extending the API of MappingQueryMatcher to disambiguate which percent-encoding behaviour is used. The current best suggestion (IMO) is to introduce a new query matcher which requires that the level of percent encoding matches exactly between the mapping values and request query strings, but is still order-independent.

If 2. and 3. had already been implemented I would not have filed this bug (because I'd figure out that the behaviour is just the documented API), so I don't consider 4. to be required to resolve this issue. However, at some point I might want to write a test that checks if a client properly quotes query parameters independently of their order, so 4. is still a good feature to add.

csernazs commented 1 month ago

Yes, that's correct, I agree with all the points. Thanks for the summary!

I'll do 2..4 points sometime in the near future.

Zsolt