commonmark / cmark

CommonMark parsing and rendering library and program in C
Other
1.62k stars 539 forks source link

Percent encoding of ~ #394

Closed ghost closed 3 years ago

ghost commented 3 years ago

Currently cmark percent encodes ~ but it doesn't do for . _ -

All 4 of them are unreserved. Shouldn't ~ also not be percent encoded?

jgm commented 3 years ago

I don't know, this comes from houdini_href_e.c which was originally from GitHub. It seems that ~ was required to be encoded in the past, and maybe the code is just playing it safe: https://jkorpela.fi/tilde.html

ghost commented 3 years ago

RFC 3986 section-2.3 says,

For consistency, percent-encoded octets in the ranges of ALPHA (%41-%5A and %61-%7A), DIGIT (%30-%39), hyphen (%2D), period (%2E), underscore (%5F), or tilde (%7E) should not be created by URI producers and, when found in a URI, should be decoded to their corresponding unreserved characters by URI normalizers.

jgm commented 3 years ago

I'm happy to change this if you want to submit a PR. Probably just need to change one item in the array in houdini_href_e.c from 1 to 0.