airbnb / hypernova

A service for server-side rendering your JavaScript views
MIT License
5.82k stars 208 forks source link

Why is React hydration data within a script tag, commented out? #48

Open justin808 opened 7 years ago

justin808 commented 7 years ago

I'm working on an issue to optimize open source library for server rendering: github.com/shakacode/react_on_rails/pull/660.

We're considering breaking away from placing react hydration data in a meta tag and moving into maybe a script tag.

Why does Airbnb puts react hydration data inside of an <script type=“application/json” data-key=“foobar”><!--{json}--></script> per the screen shots below? Is this related to XSS or performance reasons?

view-source:https://www.airbnb.com/s/Panorama--BC--Canada?guests=10&adults=10&children=0&infants=0&checkin=12%2F18%2F2017&checkout=12%2F25%2F2017&ss_id=19nvfbwh&ss_preload=true&source=bb&page=1&s_tag=pMJm3GqR&allow_override%5B%5D=

2017-02-13_15-04-16

2017-02-13_15-34-39

CC: @goatslacker

ljharb commented 7 years ago

Some older browsers will execute any script block, even if it's type is "application/json".

Best practice is to always comment it out to ensure nothing can ever execute.

justin808 commented 7 years ago

Hi @ljharb! Big thanks for the explanation! In terms of avoiding XSS, it seem that you escape & and >, and not <. Any other characters?

ljharb commented 7 years ago

The only thing that needs escaping is to prevent the literal string </script>, because if browsers see it in any context, they instantly close the script tag.

justin808 commented 7 years ago

@ljharb there might be the chance of some white space in the </script> and it being accepted, so probably you would not just strip that out. We're planning on using the Rails json_escape.

In terms of performance, did you ever compare placing the JSON data inside of meta tag? the react-rails gem does that. It results in lots more escaping when that's done that way.

ProductHunt.com uses an alternative approach where they use an inline script to set compressed, uuencoded, likely JSON data, directly to a global variable (see image below). Possibly the data is compressed to avoid XSS. If the page data were gzipped, double compression would do little. Still, one might need to strip out the '>' before placing directly in the script tag. The web client would read the global value and then decompress and parse the JSON to an object.

Did you ever try the performance of that technique? One downside I see is that until CSP 2.0 is widely supported (the nonce feature), a site that wants to use CSP would not be able to use this technique.

2017-02-11_13-30-52

ljharb commented 7 years ago

CSP is very important; inline JS would be a very bad practice. Performance is irrelevant until other concerns are addressed. A meta tag would have to go in the <head> which would delay the page from loading.

Is there something slow about a JSON script block that warrants investigation?

justin808 commented 7 years ago

Is there something slow about a JSON script block that warrants investigation?

Some users of https://github.com/shakacode/react_on_rails have complained that the use of a div's meta data element to hold the JSON data is too slow.

This is the source for https://nootrobox.com. There is too much quoting here.

Thanks again for the advice!

2017-02-13_21-42-59

squadette commented 7 years ago

@ljharb, thank you for explanation. I have one comment:

The only thing that needs escaping is to prevent the literal string </script>, because if browsers see it in any context, they instantly close the script tag.

In our testing, preliminary closing happens for everything which conforms to HTML end tag syntax (https://www.w3.org/TR/html5/syntax.html#end-tags). That means that the string can be case-insensitive and it can have spaces between script and <.

That is, <ScRiPt /> will also prematurely close.

I guess that you are actually handling that by escaping >, right?

ljharb commented 7 years ago

script isn't self-closing, but sure, < / sCrIpT> would probably be a concern too. Indeed handling either < or > covers any possibility of a closing tag.

squadette commented 7 years ago

I was sooo not awake when I wrote this. Of course it's </ScRiPt >. Sorry!

justin808 commented 7 years ago

@squadette brought up the point:

This is an explanation on why HTML comments are non-trivial if we do care about "older browsers": http://www.howtocreate.co.uk/SGMLComments.html#doubledash (via http://softwareengineering.stackexchange.com/questions/198481/why-cant-an-xml-comment-contain-two-hyphens)

Basically it means that we have to somehow escape dash-dash (even though we do escape lt/gt). That's why I'm really concerned about this idea.

@ljharb, any thoughts on if you'd need to escape double dash in the JSON?

ljharb commented 7 years ago

@justin808 that seems like a potential concern; a PR with test cases would be appreciated.

However, my reading of that is that -- merely changes how > is interpreted; so as long as > is always escaped, the issue won't come up.

squadette commented 7 years ago

@ljharb no, it's not about escaping the >. Here is an example of JSON content in HTML comment:

<!--{"foo": "bar"}-->

Here is theoretically problematic JSON content in HTML comment:

<!--{"foo": "bar -- baz"} -->

The second (inner) -- will, according to spec, turn on termination on >, but it will be again disabled by the third, final --. So, > will not be interpreted as the end of comment, and browser will ignore everything until the next -- and > (possibly with other content between the two), which is not even guaranteed to be present.

Both Chrome and Safari handle this correctly, and this is one of the worse parts of HTML spec (deprecated and removed in the new spec). My concern is that if we really do that comments thing due to compatibility with older/broken browsers then we should think about what other problems this approach could lead to in the same older/broken browsers.

FTR: as for me, properly encoded JSON within the script tag, without comments, is perfectly ok.

ljharb commented 7 years ago

Interesting - that does sound like something we want to handle, then.

squadette commented 7 years ago

Just in case, relevant part of HTML4 spec: https://www.w3.org/TR/html4/intro/sgmltut.html#h-3.2.4

HTML 3.2 says that it is SGML Application: https://www.w3.org/TR/REC-html32#sgml

HTML 2.0: https://www.w3.org/MarkUp/html-spec/html-spec_3.html#SEC3.2.5

Sometimes they only talk about whitespace, and it's unclear what happens if there is anything else between the pairs of dash-dash.

squadette commented 7 years ago

Here is historical evidence that this is real issue in old versions of Firefox: http://weblog.200ok.com.au/2008/01/dashing-into-trouble-why-html-comments.html