Closed bennnjamin closed 1 year ago
Special characters in query parameters are encoded by League\Uri\QueryString::build()
before being set as the query component of the request UriInterface
.
These characters are also decoded by League\Uri\QueryString::parse()
which is used to create the internal array in HttpRequest
, so values returned from `getQueryParameter() and friends do not need to be decoded.
@trowski thanks for the explanation, it looks like a bug in that dependency when using encoding type PHP_QUERY_RFC3986
. For now, I am building the query string manually and just setting the URI to the correct query string. In particular, I have to pass phone numbers as query parameters and the bug occurs when encoding a +
.
$queryParameters = [
"phone_number" => "+18001231234",
"special_characetrs" => " /+t!q_"
];
$queryPairs = [
["phone_number", "+18001231234"],
["special_characetrs", " /+t!q_"]
];
$queryString = QueryString::build($queryPairs, '&', PHP_QUERY_RFC1738);
print("League\URI PHP_QUERY_RFC1738 $queryString\n");
$queryString = QueryString::build($queryPairs, '&', PHP_QUERY_RFC3986);
print("League\URI PHP_QUERY_RFC3986 $queryString\n");
$queryString = http_build_query($queryParameters, "", null, PHP_QUERY_RFC1738);
print("http_build_query PHP_QUERY_RFC1738 $queryString\n");
$queryString = http_build_query($queryParameters, "", null, PHP_QUERY_RFC3986);
print("http_build_query PHP_QUERY_RFC3986 $queryString\n");
Output:
League\URI PHP_QUERY_RFC1738 phone_number=%2B18001231234&special_characetrs=+%2F%2Bt%21q_
League\URI PHP_QUERY_RFC3986 phone_number=+18001231234&special_characetrs=%20/+t!q_
http_build_query PHP_QUERY_RFC1738 phone_number=%2B18001231234&special_characetrs=+%2F%2Bt%21q_
http_build_query PHP_QUERY_RFC3986 phone_number=%2B18001231234&special_characetrs=%20%2F%2Bt%21q_
After digging into this further it seems that some older servers aren't really compliant with RFC 3986. Would you be open to providing a way to override or set the encoding type to PHP_QUERY_RFC1738
or PHP_QUERY_RFC3986
? League\Uri\QueryString::build()
supports it, but it's not exposed publicly in HttpRequest
as a parameter.
See also https://github.com/thephpleague/uri-src/issues/109.
+
is also used for whitespace, no? Which spec do browsers follow? Do we simply reference the wrong rfc here?
@kelunik Yeah that was me that just opened that issue. Honestly I don't know enough about the spec but there are some major inconsistencies in real world implementations. I think that's why it's important to provide an option at the library level, since this could be used to connect to a variety of legacy HTTP servers with no support for recent RFCs. JavaScript's encoderURIComponent
encodes a + but encodeURI
does not, for example.
In my case, I don't have control over the API, and it expects the + to be encoded, so I have to encode it regardless of whether the RFC says that's correct.
For reference you have RFC1738, RFC3986, RFC3987 and also the living standard each with their own set of rules around encoding. So it really depends which RFC is followed by which client.
@nyamsprod Do you have an insight on a good direction here? Does it make the most sense to offer an option as to which encoding is used, or could we use RFC1738 which seems to encode everything?
From my understanding and research when developping the package, browsers (HTTP Clients) are leaning toward using a version that ressembles RFC1738 to align URI encoding with Forms Data encoding (see the living standard). Whereas servers historically are leaning towards following RFC3986. So like everything in programming the answer is it depends and there is no clear one solution for all
@trowski Seems like we still need to change https://github.com/amphp/http/blob/fe1d018a53cfbd77fa794b3cb15a215c1f2696f0/src/HttpRequest.php#L204 to use the old RFC encoding?
Yes, I'll update and tag a bugfix release.
@trowski @bennnjamin @kelunik after implementing URLSearchParams
I can confidently say that there are 3 encoding types for the query component:
Browsers, at least modern browsers that follow the WHATWG group now only uses the later for Forms and query string in order to align all parsing/serializing.
Starting with version 7.1.0 the latter should be accessible via the Modifier
class using the encodeQuery
method as shown below
use League\Uri\Modifier;
use League\Uri\KeyValuePair\Converter;
$formDataConverter = Converter::new('&')
->withEncodingMap(['%20' => '+', '%2A' => '*']);
$uri = Modifier::from($uri)
->encodeQuery(formDataConverter)
->getUriString();
or the QueryString
class
use League\Uri\QueryString;
use League\Uri\KeyValuePair\Converter;
$formDataConverter = Converter::new('&')
->withEncodingMap(['%20' => '+', '%2A' => '*']);
$query = QueryString::buildFromPairs($pairs, $formDataConverter);
//$query will be a string or `null` if $pairs is an empty array.
In the next minor version (7.3.0 I presume) you will be able to simply a bit the code to
use League\Uri\Modifier;
use League\Uri\KeyValuePair\Converter;
echo Modifier::from($uri)
->encodeQuery(Converter::fromFormData())
->getUriString();
and with the QueryString
class
use League\Uri\QueryString;
use League\Uri\KeyValuePair\Converter;
echo QueryString::buildFromPairs($pairs, Converter::fromFormData());
//returns a string or `null` if $pairs is an empty array.
7.3.0
will also give you access to the PHP implementation of URLSearchParams
but that's another topic even though it will give you proper representation of how browser and client should handle query string.
I'm primarily using http-client, but almost all query parameter handling is implemented in the parent
HttpRequest
class here. It doesn't look like there is any place that the special characters in query parameters are being encoded. Is this intentional? Or does it make sense to provide this option?Please advise the best way going forward.