Closed 2is10 closed 12 years ago
Thanks for those links, I appreciate it!
I've been giving this some thought. It seems like the two main/popular methods of encoding cookie values are URI encoding and Base64 encoding. I'm making an assumption that most server side frameworks already have libraries for handling these types of encoding, so either type would be relatively natural to work with. Currently I'm using URI encoding, because JavaScript has functions built in for it. Plus, URI encoding is safe in that it will escape all special characters defined in the RFC, and will not itself produce any of those characters.
Is it really worth using custom encoding/decoding functions just to escape fewer characters? Won't they just inevitably get passed into the same "decodeURI" function they would have been passed into on the server side anyways?
Also, if a literal "%"
cookie value were to be URI encoded, it becomes escaped to "%25"
. Isn't this adequate?
Thanks for taking time to discuss this with me, I'd like to know more of your thoughts.
Agreed that URI encoding and Base64 are the natural encoding choices. What I'm suggesting is a fully-decode-compatible subset of URI encoding (also known as "percent encoding") suitable for ASCII strings (the common case). The resulting encoded string can be decoded using decodeURIComponent
or any other function for decoding a percent-encoded string in any language, any library, any framework. You wouldn't be sacrificing compatibility. You'd just be saving bytes in cookies, and making them more readable for developers, besides.
Here's an example of a cookie from nytimes.com. You can see that it uses |
unescaped:
rsi_segs=H07707_10910|H07707_10599|D08734_70045|D08734_70076|D08734_72019|H07707_11028|H07707_11029|H07707_11030|H07707_11031|H07707_11044|H07707_11048|H07707_11049|H07707_11087|D08734_72771|H07707_11100|H07707_11103|H07707_11104|H07707_11105|H07707_10638
Here's an example of a Google Analytics cookie, the likes of which you can find on most sites, using =
and |
unescaped:
__utmz=55650728.1322243831.3.3.utmcsr=news.ycombinator.com|utmccn=(referral)|utmcmd=referral|utmcct=/
My point in including these examples is to help convince you that it's common to use these kinds of delimiters and that's it valuable not to escape them.
By the way, here's a more compact alternative to my earlier snippet that uses the built-in escape
function:
value.replace(/[ %",;\\]/g, escape)
Of course, if you'd rather not restrict your library users to ASCII strings, here's a slightly more complex variant that accepts arbitrary Unicode strings and still avoids unnecessary escaping:
value.replace(/[^!#-+\--:<-[\]-~]/g, encodeURIComponent)
The safe character ranges in the regex above are copied directly from RFC 6265: x21 / x23-2B / x2D-3A / x3C-5B / x5D-7E. Here's a Unicode example that allows you to compare the two methods:
encodeURIComponent("|piñata=papier-mâché|")
// "%7Cpi%C3%B1ata%3Dpapier-m%C3%A2ch%C3%A9%7C"
"|piñata=papier-mâché|".replace(/[^!#-+\--:<-[\]-~]/g, encodeURIComponent)
// "|pi%C3%B1ata=papier-m%C3%A2ch%C3%A9|"
Note that decodeURIComponent
correctly decodes both encoded values to the same (original) value:
decodeURIComponent("%7Cpi%C3%B1ata%3Dpapier-m%C3%A2ch%C3%A9%7C")
// "|piñata=papier-mâché|"
decodeURIComponent("|pi%C3%B1ata=papier-m%C3%A2ch%C3%A9|")
// "|piñata=papier-mâché|"
While escaping using a regular expression will be slower than just using encodeURIComponent
directly, it will only be microseconds slower for strings that are beneath the cookie size limit, and cookie escaping is not the kind of operation you find in a hot loop. I hope you agree that running time is not a practical concern.
Good points regarding the payload size of the cookie, and readability of the encoded value. I'll update the repo later tonight when I get home from work.
Thanks again!
I think this was the right thing to do. Thanks!
For example, the vertical bar
|
is unnecessarily escaped in cookie values.Check the RFC... only five non-control ASCII characters need to be escaped in cookie values: space, comma, semicolon, double-quote, and backslash. This question is also clearly answered on stackoverflow.
When using percent-encoding, it's also important to escape the percent symbol. Here's how to percent-encode just the characters in cookie values that need it: