Open WillMorrisonEnspi opened 1 month ago
hi @WillMorrisonEnspi ,
I think this behavior is consistent. I may misunderstood your issue, but here's my view:
==
after encoding.This behaviour is partially documented (and unfortunatelly missing from the built html doc, that will be fixed):
String matching: https://github.com/csernazs/pytest-httpserver/blob/master/pytest_httpserver/httpserver.py#L188
Mapping matching: https://github.com/csernazs/pytest-httpserver/blob/master/pytest_httpserver/httpserver.py#L188
Also, there's a howto about this: https://pytest-httpserver.readthedocs.io/en/latest/howto.html#matching-query-parameters
Here, you specified an already quoted value in the dict, so it got double-quoted in pytest-httpserver:
httpserver.expect_request("/test", query_string={"foo": "bar%20baz"}).respond_with_data("OK")
Here, you specified an unquoted string, which then did't match with the quoted one sent by requests as server compares it with ==
.
httpserver.expect_request("/test", query_string="foo=bar baz").respond_with_data("OK")
pytest-httpserver provides you an API to define your own query matcher, so the first failed case can be solved by this:
class MyQueryStringMatcher(QueryMatcher):
def __init__(self, expected_string: str):
parsed = urllib.parse.parse_qsl(expected_string) # Parse query string into key-value pairs
self._encoded_query_string = urllib.parse.urlencode(parsed, quote_via=urllib.parse.quote)
def get_comparing_values(self, request_query_string: bytes) -> tuple:
return (self._encoded_query_string, request_query_string.decode("utf-8"))
def test_query_string_with_spaces_string_fails(httpserver: HTTPServer):
httpserver.expect_request("/test", query_string=MyQueryStringMatcher("foo=bar baz")).respond_with_data("OK")
url = httpserver.url_for("/test") + "?foo=bar baz"
requests.get(url)
httpserver.check_assertions()
The other case can be solved by this:
class QuotedDictMatcher(MappingQueryMatcher):
def __init__(self, query_dict: dict[str, str]):
unquoted_dict = {k: urllib.parse.unquote(v) for k, v in query_dict.items()}
super().__init__(unquoted_dict)
def test_query_string_is_encoded_dict_fails(httpserver: HTTPServer):
httpserver.expect_request("/test", query_string=QuotedDictMatcher({"foo": "bar%20baz"})).respond_with_data("OK")
url = httpserver.url_for("/test") + "?foo=bar baz"
requests.get(url)
httpserver.check_assertions()
Changing querystring matcher in pytest-httpserver to handle your case would be an API breaking, and it would break the current cases.
Zsolt
- when you specify string, pytest-httpserver expects a quoted string so that will be compared with
==
after encoding.- when you specify a dict or MultiDict, then this will be quoted by pytest-httpserver so then it will be compared.
This is exactly the part I find inconsistent. Strings must already be quoted, while dict values should not already be quoted.
This behaviour is partially documented
I did read through the docs, but this was not explained there. The String matcher documents what happens when you pass string vs bytes, and says that the bytes must match the incoming request exactly (that's correct and matches the behaviour from my reproduction test). However, the Mapping matcher docs don't mention anything about quoting. I would read this as
value handling should behave identically to the String matcher", but it doesn't as there is an additional percent-decoding of the query string introduced by the parse_qsl
call.
I was also thrown off by the fact that the assertion messages from the Mapping matcher don't show the actual level of percent decoding happening. Compare:
AssertionError: No handler found for request <Request 'http://localhost:38771/test?foo=bar%20baz' [GET]> with data b''
and the registered matcher
<RequestMatcher uri='/test' method='__ALL' query_string={'foo': 'bar%20baz'} headers={} data=None json=<UNDEFINED>>
Perhaps the Mapping matcher's __repr__
should be updated to show the percent-decoded values if the request is also being percent decoded before comparison.
Changing querystring matcher in pytest-httpserver to handle your case would be an API breaking, and it would break the current cases.
This is a bit tricky to handle, I agree. Options below.
[^1]: This is already sort of the case. A Mapping matcher value with no percent encoding in it will match requests with either one or no levels of encoding. E.g. {"foo": "bar baz"}
will match a request that is either ?foo=bar%20baz
or one that is ?foo=bar baz
.
- when you specify string, pytest-httpserver expects a quoted string so that will be compared with
==
after encoding.- when you specify a dict or MultiDict, then this will be quoted by pytest-httpserver so then it will be compared.
This is exactly the part I find inconsistent. Strings must already be quoted, while dict values should not already be quoted.
The main idea behind this is that pytest-httpserver cannot quote this:
?a=foo bar&b=qqq&c=123
I also think that interpreting the &
is ambiguous as it is not telling whether the &
and others should be quoted or not.
One might says it is:
?a=foo%20bar&b=qqq&c=123
Other might say:
?a=foo%20bar%26b%3Dqqq%26c%3D123
You can say that &
and =
are not quoted, though. But then how one can specify &
as a value? If they specify it with quoted, then that value will be un-quoted.
This behaviour is partially documented
I did read through the docs, but this was not explained there. The String matcher documents what happens when you pass string vs bytes, and says that the bytes must match the incoming request exactly (that's correct and matches the behaviour from my reproduction test). However, the Mapping matcher docs don't mention anything about quoting. I would read this as value handling should behave identically to the String matcher", but it doesn't as there is an additional percent-decoding of the query string introduced by the
parse_qsl
call.
Yes, I agree document needs to be extended with quoting details.
I was also thrown off by the fact that the assertion messages from the Mapping matcher don't show the actual level of percent decoding happening. Compare:
AssertionError: No handler found for request <Request 'http://localhost:38771/test?foo=bar%20baz' [GET]> with data b''
and the registered matcher<RequestMatcher uri='/test' method='__ALL' query_string={'foo': 'bar%20baz'} headers={} data=None json=<UNDEFINED>>
Perhaps the Mapping matcher's
__repr__
should be updated to show the percent-decoded values if the request is also being percent decoded before comparison.
Thats a good idea.
Changing querystring matcher in pytest-httpserver to handle your case would be an API breaking, and it would break the current cases.
This is a bit tricky to handle, I agree. Options below.
1. If the API was changed so that the Mapping matched didn't percent decode the request values, then existing calls could break. **bad, breaking change** 2. If the API was changed so that the Mapping matcher accepted either percent-encoded or not encoded values matching the query string in the request, then the breakage would be more subtle. Existing tests wouldn't start failing, but would start accepting more. There would be no way to check for "exactly this level of percent-encoding, but independent of order"[1](#user-content-fn-1-1825cdf4f80189722bbb03e6ae16fc5a). **bad, reduced functionality**
If existing tests provide %
in their values, then those will start to fail.
If you have a test with {"foo": "%20bar"}
then it will fail as %20bar
will be handled differently.
3. Adding a flag/keyword argument to the function to choose between percent-decoding the query string or not shouldn't break any use cases. **OK, but makes the API messier**
I think this can be done.
The matcher interface is designed for this, to provide more flexibility. I agree writing QuotedDictMatcher({"foo": "bar%20baz"})
makes the API less human-friendly, but IMHO this would be the way. MappingQueryMatcher
could also have an optional kwarg flag, specifying the rules for quoting.
Alternatively there would be a always_quote
parameter, having 3 possible values:
4. Just documenting the behaviour around percent encoding and making that part of the API. **Valid, but locks in the inconsistency**
That would be the easiest for sure, but the question is how this can be implemented in a way that:
Footnotes
1. This is already sort of the case. A Mapping matcher value with no percent encoding in it will match requests with either one or no levels of encoding. E.g. `{"foo": "bar baz"}` will match a request that is either `?foo=bar%20baz` or one that is `?foo=bar baz`. [↩](#user-content-fnref-1-1825cdf4f80189722bbb03e6ae16fc5a)
Welp, this also introduces ambiguity, if one wants to check if their client properly quotes the query parameters, then this won't check that.
Zsolt
Summarizing the conversation so far.
StringQueryMatcher
should not change, for the reasons you mentioned around ambiguity about &
and =
characters that would otherwise arise.MappingQueryMatcher
should be updated to mention the percent-decoding that happens to the request's query string before matching, and that this means that the mapping's values should not be percent encoded.__repr__
of the RequestMatcher
(or the AssertionError
message) should be updated to make it clearer why an unhandled request was not matched.MappingQueryMatcher
to disambiguate which percent-encoding behaviour is used. The current best suggestion (IMO) is to introduce a new query matcher which requires that the level of percent encoding matches exactly between the mapping values and request query strings, but is still order-independent.If 2. and 3. had already been implemented I would not have filed this bug (because I'd figure out that the behaviour is just the documented API), so I don't consider 4. to be required to resolve this issue. However, at some point I might want to write a test that checks if a client properly quotes query parameters independently of their order, so 4. is still a good feature to add.
Yes, that's correct, I agree with all the points. Thanks for the summary!
I'll do 2..4 points sometime in the near future.
Zsolt
It seems like the type (string vs dict) of
query_string
affects the behaviour of the resulting matcher when the query string in the request has percent-encoding. Specifically, a string typed argument toquery_string
must match exactly the query string in the request. However, a dict typed argument toquery_string
must match the request's query string after unquoting.Reproduced with pytest-httpserver version 1.1.0
test file:
Output: