amphp / http

HTTP primitives which can be shared by servers and clients.
https://amphp.org/http
MIT License
88 stars 10 forks source link

How to encode query parameters? #24

Closed bennnjamin closed 1 year ago

bennnjamin commented 1 year ago

I'm primarily using http-client, but almost all query parameter handling is implemented in the parent HttpRequest class here. It doesn't look like there is any place that the special characters in query parameters are being encoded. Is this intentional? Or does it make sense to provide this option?

Please advise the best way going forward.

trowski commented 1 year ago

Special characters in query parameters are encoded by League\Uri\QueryString::build() before being set as the query component of the request UriInterface.

These characters are also decoded by League\Uri\QueryString::parse() which is used to create the internal array in HttpRequest, so values returned from `getQueryParameter() and friends do not need to be decoded.

bennnjamin commented 1 year ago

@trowski thanks for the explanation, it looks like a bug in that dependency when using encoding type PHP_QUERY_RFC3986. For now, I am building the query string manually and just setting the URI to the correct query string. In particular, I have to pass phone numbers as query parameters and the bug occurs when encoding a +.

$queryParameters = [
    "phone_number" => "+18001231234",
    "special_characetrs" => " /+t!q_"
];
$queryPairs = [
    ["phone_number", "+18001231234"],
    ["special_characetrs", " /+t!q_"]
];
$queryString = QueryString::build($queryPairs, '&', PHP_QUERY_RFC1738);
print("League\URI PHP_QUERY_RFC1738 $queryString\n");
$queryString = QueryString::build($queryPairs, '&', PHP_QUERY_RFC3986);
print("League\URI PHP_QUERY_RFC3986 $queryString\n");
$queryString = http_build_query($queryParameters, "", null, PHP_QUERY_RFC1738);
print("http_build_query PHP_QUERY_RFC1738 $queryString\n");
$queryString = http_build_query($queryParameters, "", null, PHP_QUERY_RFC3986);
print("http_build_query PHP_QUERY_RFC3986 $queryString\n");

Output:

League\URI PHP_QUERY_RFC1738 phone_number=%2B18001231234&special_characetrs=+%2F%2Bt%21q_
League\URI PHP_QUERY_RFC3986 phone_number=+18001231234&special_characetrs=%20/+t!q_
http_build_query PHP_QUERY_RFC1738 phone_number=%2B18001231234&special_characetrs=+%2F%2Bt%21q_
http_build_query PHP_QUERY_RFC3986 phone_number=%2B18001231234&special_characetrs=%20%2F%2Bt%21q_
bennnjamin commented 1 year ago

After digging into this further it seems that some older servers aren't really compliant with RFC 3986. Would you be open to providing a way to override or set the encoding type to PHP_QUERY_RFC1738 or PHP_QUERY_RFC3986? League\Uri\QueryString::build() supports it, but it's not exposed publicly in HttpRequest as a parameter.

kelunik commented 1 year ago

See also https://github.com/thephpleague/uri-src/issues/109.

+ is also used for whitespace, no? Which spec do browsers follow? Do we simply reference the wrong rfc here?

bennnjamin commented 1 year ago

@kelunik Yeah that was me that just opened that issue. Honestly I don't know enough about the spec but there are some major inconsistencies in real world implementations. I think that's why it's important to provide an option at the library level, since this could be used to connect to a variety of legacy HTTP servers with no support for recent RFCs. JavaScript's encoderURIComponent encodes a + but encodeURI does not, for example.

In my case, I don't have control over the API, and it expects the + to be encoded, so I have to encode it regardless of whether the RFC says that's correct.

nyamsprod commented 1 year ago

For reference you have RFC1738, RFC3986, RFC3987 and also the living standard each with their own set of rules around encoding. So it really depends which RFC is followed by which client.

trowski commented 1 year ago

@nyamsprod Do you have an insight on a good direction here? Does it make the most sense to offer an option as to which encoding is used, or could we use RFC1738 which seems to encode everything?

nyamsprod commented 1 year ago

From my understanding and research when developping the package, browsers (HTTP Clients) are leaning toward using a version that ressembles RFC1738 to align URI encoding with Forms Data encoding (see the living standard). Whereas servers historically are leaning towards following RFC3986. So like everything in programming the answer is it depends and there is no clear one solution for all

kelunik commented 1 year ago

@trowski Seems like we still need to change https://github.com/amphp/http/blob/fe1d018a53cfbd77fa794b3cb15a215c1f2696f0/src/HttpRequest.php#L204 to use the old RFC encoding?

trowski commented 1 year ago

Yes, I'll update and tag a bugfix release.

nyamsprod commented 1 year ago

@trowski @bennnjamin @kelunik after implementing URLSearchParams I can confidently say that there are 3 encoding types for the query component:

Browsers, at least modern browsers that follow the WHATWG group now only uses the later for Forms and query string in order to align all parsing/serializing.

Starting with version 7.1.0 the latter should be accessible via the Modifier class using the encodeQuery method as shown below

use League\Uri\Modifier;
use League\Uri\KeyValuePair\Converter;

$formDataConverter = Converter::new('&')
    ->withEncodingMap(['%20' => '+', '%2A' => '*']);

$uri = Modifier::from($uri)
    ->encodeQuery(formDataConverter)
    ->getUriString();

or the QueryString class

use League\Uri\QueryString;
use League\Uri\KeyValuePair\Converter;

$formDataConverter = Converter::new('&')
    ->withEncodingMap(['%20' => '+', '%2A' => '*']);

$query = QueryString::buildFromPairs($pairs, $formDataConverter);
//$query will be a string or `null` if $pairs is an empty array.

In the next minor version (7.3.0 I presume) you will be able to simply a bit the code to

use League\Uri\Modifier;
use League\Uri\KeyValuePair\Converter;

echo Modifier::from($uri)
    ->encodeQuery(Converter::fromFormData())
    ->getUriString();

and with the QueryString class

use League\Uri\QueryString;
use League\Uri\KeyValuePair\Converter;

echo QueryString::buildFromPairs($pairs,  Converter::fromFormData());
//returns a string or `null` if $pairs is an empty array.

7.3.0 will also give you access to the PHP implementation of URLSearchParams but that's another topic even though it will give you proper representation of how browser and client should handle query string.