Closed ndp-opendap closed 4 years ago
Perhaps the application of this rule?
if (mode == Mode.BLOCK || mode == Mode.HTML) {
// in <script> blocks, we need to prevent the browser from seeing
// "</anything>" and "<!--". To do so we escape "/" as "\/" and
// escape "-" as "\-".
I think so.
Thanks for the bug. This function should not be used to escape JSON. If you want to embed JSON on a page I would serialize it. https://github.com/yahoo/serialize-javascript
This function is to escape embedded JavaScript strings per the examples laid on in our docs.
Is this at all helpful?
Jim Manico @Manicode
On Jun 6, 2019, at 4:33 PM, Nathan Potter notifications@github.com wrote:
I think so.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.
Sorry for the spam above, but I agree with the original post by Nathan and with the analysis by @pthorson.
The encoding of raw -
as raw \-
results in an invalid JSON (but a valid ECMAScript) string literal.
string = quotation-mark *char quotation-mark
char = unescaped /
escape (
%x22 / ; " quotation mark U+0022
%x5C / ; \ reverse solidus U+005C
%x2F / ; / solidus U+002F
%x62 / ; b backspace U+0008
%x66 / ; f form feed U+000C
%x6E / ; n line feed U+000A
%x72 / ; r carriage return U+000D
%x74 / ; t tab U+0009
%x75 4HEXDIG ) ; uXXXX U+XXXX
escape = %x5C ; \
quotation-mark = %x22 ; "
unescaped = %x20-21 / %x23-2E / %x30-5B / %x5D-10FFFF
https://www.rfc-editor.org/errata_search.php?rfc=8259
CharacterEscapeSequence ::
SingleEscapeCharacter
NonEscapeCharacter
SingleEscapeCharacter :: one of
' " \ b f n r t v
NonEscapeCharacter ::
SourceCharacter but not one of EscapeCharacter or LineTerminator
https://www.ecma-international.org/ecma-262/6.0/#sec-literals-string-literals https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String#Escape_notation https://www.json.org/
There is little need to have the "mode" parameter in the Javascript string encoder. It brings unnecessary usage complexity as encoding for the worst case would still be parseable in the easiest use case as well.
In addition, the existing code missed extra attack surfaces,
This suggests to encode with raw \uHHHH
the following single characters, regardless of "mode" (and add another method signature without the parameter, deprecating the current method). This will add to the existing encodings of the double quote raw "
as raw \"
et c. but remove the ECMA compatible non-JSON encoding raw \-
. The encoding of the forward slash raw /
as raw \/
appears compatible with both ECMA and JSON. To sum up, these characters need replacing on output.
raw "
with raw \"
against escaping a javascript string surrounded by double quotes,raw '
with raw \u0027
against escaping a javascript string surrounded by single quotes,raw /
with raw \/
against constructing a closing script tag in a javascript string,raw <
against closing the script tag or opening a comment tag,raw >
against closing a possible CDATA wrapper around the javascript text embedded in XHTML (and HTML?),U+2028
and U+2029
allowed in JSON but not in Javascript against disrupting the javascript engine.raw &
against the mandatory HTML entity decoding in XHTML documents embedding scripts without a CDATA wrapper. (HTML documents do not apply the entity decoding to embedded scripts but the suggested string literal encoding does not change the string's value).I understand that these layers look secondary to JSON or Javascript string literal encoding per se, but leaving the task of protecting the string literal against these layers would require re-implementing the encoder for each use case. Attempting to apply additional encoding with simple .replace() chaining will suffer from destroying the context of the backslash-protected characters. Luckily, the suggested "preliminary optimization" by using the raw \uHHHH
encoding is both acceptable by the lower level interpreter (javascript engine) and defending against additional interpreters on top of it.
This seems like a good argument. Would you care to give us a PR? I'll also as Jeff to take a look at this.
PS: The earlier spam was accidental and I deleted it. I'm back to updating this project.
After speaking to Jeff on this, we have the following response and wish to close the issue and open a new feature request.
Summary: encodeForJavaScript is not meant to be safe for JSON, but we do plan to add a new JSON encode function, in the meantime consider https://github.com/yahoo/serialize-javascript type logic to safely embed JSON on a page.
Greetings,
When I do the following:
The ending value of
jsVal
is "CF\-1.5". Based on the javadoc for the Encode class here:https://owasp.github.io/owasp-java-encoder/encoder/apidocs/index.html?index-all.html
I believe that
Encode.forJavaScript()
is incorrectly/needlessly escaping the "-" character.The javadoc does not list "-" as a character to be escaped, and the RFC for JSON:
https://tools.ietf.org/html/rfc7159#section-7
Does not list the "-" character as one that is included in the list of "two-character sequence escape representations of some popular characters".
It seems that since any character can be escaped there is no issue in escaping the "-" character, but I do not think there is a two character shorthand "\-" supported and that the hex escaped version must be used.
Is this a known problem? Has no one else had their JSON parser throw a rod because of this unsupported escape sequence?
Thanks,
Nathan