Stiffstream / restinio

Cross-platform, efficient, customizable, and robust asynchronous HTTP(S)/WebSocket server C++ library with the right balance between performance and ease of use
Other
1.15k stars 93 forks source link

restinio unable to regonize `*!` in url #76

Closed zhangzq closed 4 years ago

zhangzq commented 4 years ago

when request /foo?bar=*, restinio will raise exception:

restinio error while handling request: invalid non-escaped char with code 0X2A

Also for url /foo?bar=!.

*! will be not encoded in url in Chrome. One can check the result of encodeURIComponent("!*") in Chrome console.

eao197 commented 4 years ago

Hi!

Asterisk is handled by using a special trait for restinio::parse_query as described in the doc.

The presence of an unescaped exclamation mark in query params is not handled in a special way. I'll investigate that case.

zhangzq commented 4 years ago

Hi!

Asterisk is handled by using a special trait for restinio::parse_query as described in the doc.

The presence of an unescaped exclamation mark in query params is not handled in a special way. I'll investigate that case.

Thanks. I used restinio::parse_query<restinio::parse_query_traits::restinio_defaults>, it resolved *, but not !

And I also found ( and ) in url will cause the same problem. I'm not sure is there any other invalid characters.

eao197 commented 4 years ago

The problem is that *!()&=;? are reserved chars that should be used as separators in URI. If not they should be percent-encoded.

RESTinio parses query-string to the pairs name=value with the assumption that only &;= are legal in such query-string representation. If there is a sequence like a=(&b=) then how it should be treated?

I can expand javascript_compatible trait to interpret ! as ordinary character. But I in doubt that the same can and should be done for () or some other character.

If you really have to parse query strings with non-percent-encoded !() and other chars please consider to define and use your own trait. The idea can be taken from here.

Or you can provide a reference to some well-established specification for query-string representation in the form of name=value sequence.

zhangzq commented 4 years ago

The problem is that *!()&=;? are reserved chars that should be used as separators in URI. If not they should be percent-encoded.

RESTinio parses query-string to the pairs name=value with the assumption that only &;= are legal in such query-string representation. If there is a sequence like a=(&b=) then how it should be treated?

I can expand javascript_compatible trait to interpret ! as ordinary character. But I in doubt that the same can and should be done for () or some other character.

If you really have to parse query strings with non-percent-encoded !() and other chars please consider to define and use your own trait. The idea can be taken from here.

Or you can provide a reference to some well-established specification for query-string representation in the form of name=value sequence.

Thanks for your reply.

The problem only for *!(), not for =&?a=(&b=) should be a=( and b=)。Here is no ambiguity.

You can check encodeURIComponent("!*()&=?") in Chrome console to found out which is percent-encoded and which not. I'm not sure the standard, but this is how implemented by Chrome, the most used browser in the world.

eao197 commented 4 years ago

There is RFC3986 where specified what can be in the query part of URL:

query         = *( pchar / "/" / "?" )
pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"
unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
reserved      = gen-delims / sub-delims
gen-delims    = ":" / "/" / "?" / "#" / "[" / "]" / "@"
sub-delims    = "!" / "$" / "&" / "'" / "(" / ")"
                 / "*" / "+" / "," / ";" / "="

So "!*()&=?" can be present in the query part in non-percent-encoded form.

But we are speaking not about the query part in general, but about a case where the query part should be treated as a serie of name=value pairs (note that RFC3986 doesn't require that the query part should be in that form). I think there can be different rules.

Anyway, I'll investigate this case further and if the interpretation of strings like a=*&b=!&c=[&d=]&e=(&f=) is (a=*, b=!, c=[, d=], e=(, f=)), then I'll update RESTinio's javascript_compatible trait.

eao197 commented 4 years ago

This is just a note related to the issue. I keep it here for tracking various references those I've found discovering information about representation of query-string in URI.

There is a specification for application/x-www-form-urlencoded data representation:

There is also a note in Wikipedia about possible usage of semicolon (;, 0x3B) as a separator between name=value pairs.

eao197 commented 4 years ago

The result of checking encodeURIComponent on MDN web docs.

Test script:

console.log(encodeURIComponent(":/?#[]@"))
console.log(encodeURIComponent("!$&'()*+,;="))
console.log(encodeURIComponent("-.~_*"))

the result is:

"%3A%2F%3F%23%5B%5D%40"
"!%24%26'()*%2B%2C%3B%3D"
"-.~_*"
zhangzq commented 4 years ago

I noticed some related patchs was committed. Can I use the newest version now?

eao197 commented 4 years ago

I think it's better to wait while I merge new changes into 0.6-dev branch. I hope I'll do it later today.

eao197 commented 4 years ago

@zhangzq you can try the version from 0.6-dev-0.6.5 branch. It seems that fixes will be here for some time until we'll address other issues. And only then 0.6-dev and master branches will be updated.

eao197 commented 4 years ago

@zhangzq Fixes for this issue are released as a part of v.0.6.5.

zhangzq commented 4 years ago

I tried. Everything goes well. Thanks.