ethanresnick / json-api

Turn your node app into a JSON API server (http://jsonapi.org/)
GNU Lesser General Public License v3.0
268 stars 41 forks source link

json-api reads from Raw URLs instead of decoded #187

Open numerical25 opened 5 years ago

numerical25 commented 5 years ago

I saw some open tickets for URL Encoding but I am not sure if it pertains to this. I am unable to test with applications such as Postman because the application encodes the URLS.

So something like this

filter=(:or,(:and,(events.end_date,:gte,1549745450224),(events.start_date,:lte,1549745450224)))

Turns into this...

filter=%28:or,%28:and,%28events.end_date,:gte,1549745450224%29,%28events.start_date,:lte,1549745450224%29%29%29

json-api breaks because its reading from raw request.rawQueryString instead of request.queryParams

Is there a setting to change this or can this be fixed. It seems json-api server should be able to handle both forms of query strings.

ethanresnick commented 5 years ago

Some context... the URL specification contains the concept of "reserved characters", which are specific characters that have a different meaning when they're encoded vs unencoded. The idea is that the unencoded form of the character will have some sort of structural significance in separating the various parts of the URL, whereas the encoded form allows the character to be used in a data value without it getting confused for its structural equivalent.

A simple example would be the ? character: when unencoded, ? indicates the start of the query string; when encoded (as %3F), it just represents a question mark. The encoding allows you to have (e.g.) a path segment that includes a ?, without this triggering the URL parser to think it's in the query string.

According to the URL spec, (, ), , etc are all reserved characters, which is to say that Postman should not be automatically encoding them, as doing so will change the meaning of the URL. To quote from the spec directly:

The purpose of reserved characters is to provide a set of delimiting characters that are distinguishable from other data within a URI. URIs that differ in the replacement of a reserved character with its corresponding percent-encoded octet are not equivalent. Percent-encoding a reserved character, or decoding a percent-encoded octet that corresponds to a reserved character, will change how the URI is interpreted by most applications.

So, from a theoretical POV, it's definitely Postman that's in the wrong here.... but I realize that doesn't really help you. Which leads to the more practical question: could json-api server handle both forms?

The short answer is yes -- for at least some cases -- but it would be a breaking change and would require some changes to the serializer to prevent security issues going forward. It also possibly creates security issues if a server updates from being encoding-sensitive to not being encoding sensitive, but not all it's clients update. I'd need to think this through....

Regardless, we probably wouldn't want to decode the whole string before parsing it, because that's going to create too much ambiguity. For example, json-api's filter syntax currently supports a fallback string encoding that looks like this: !Hello%20world%21!. Because strings can contain exclamation points (as in Hello world! above), we certainly need to be able distinguish between the encoded exclamation point in the string and the literal ! that indicates the start/end of the string.

What we could do is define just a few characters for which we treat their encoded and unencoded forms as equivalent. Hopefully, we can define enough of these equivalencies to make the syntax substantially more robust (so it automatically works with Postman), while also not defining more than we need to — because, for every character where we treat the encoded and unencoded version as equivalent, we either lose the ability to use that character as part of a name or lose the ability to use it as a delimiter (since we can no longer tell the two uses apart).

So, finalizing this list of equivalent characters would take some time and testing. At the least, it seems like it should include: (, ), [, ], and ', but we'd also want to simultaneously decide on =, *,:, @, $, and ; (as deciding later could cause yet more security issues), and I'm not sure about those.

Bottom line: I doubt I'll have time to think this all through and issue a fix soon, so you should see if you can find a testing tool that knows how to URL properly. I've used in the past Advanced REST Client, which is free, and I don't think it had these problems iirc.

numerical25 commented 5 years ago

I will take a look at Advanced Rest Client. I've also created a ticket in Postman indicating that the application goes against JSON:API Specifications. For a well known Tool, it should be aware of these types of specifications.

ethanresnick commented 5 years ago

I've also created a ticket in Postman indicating that the application goes against JSON:API Specifications.

Sounds good. For what it's worth, though, the JSON:API specification actually leaves it up to each implementation to define its own filtering syntax, so the syntax used here is actually specific to this library, rather than part of JSON:API proper. Still, there are other specs that rely on unencoded url characters, and postman's behavior is pretty unambiguously wrong, so it really should be fixed.