Closed ioquatix closed 7 years ago
+1
:+1:
:+1:
+1
:+1:
Wow, has this really been a bug for over a year?
@dangnelson No, I don't think they even regard it as an issue. This is probably a "won't fix".
This isn't an issue with the Markup gem itself, but it certainly is a valid bug. Looking into a resolution now.
@kivikakk any update? Thanks!!
@ioquatix: we're discussing it currently! The biggest issue we need to overcome is security due to JS execution in SVG — allowing user-controlled scripts on a GitHub domain could be dangerous — so we're considering strategies to work in out. It might take a bit to get it fully ironed out, but I'm pushing this forward internally, so I'll let you know when there's any updates.
@kivikakk I thought that might be a problem. I'm glad you are thinking about it. The content is served from a raw.github.io right? What about using something like https://content-security-policy.com to protect your main domain?
@kivikakk Any update on this?
Unfortunately not yet; there's been a lot going on. We are still pushing forward with the issue (lots of moving parts), but slowly.
@kivikakk Perhaps this is a stupid comment/suggestion, and I understand what your are saying, but then how come SVG previews already work in this context? https://github.com/ioquatix/build-dependency/blob/master/partial.svg
@ioquatix when rendering a full blob, we use our internal "Render" system, which does sandboxed content rendering. Because it can take some time, there's also AJAX and a bunch of other things involved. (e.g. here's the actual path to the SVG renderer) This would look super ugly if used inline (all those AJAX spinners ...), and because it depends on an <iframe>
for sandboxing, getting the width/height to propagate would be super awkward.
So we're looking into a solution that doesn't require using Render while still providing adequate security. (Currently thinking of sanitising the SVG ourselves, or a solution involving randomised subdomain names for each request to prevent cross-tab JS access.)
Oh, thanks for the awesome explanation.
That makes sense.
Well, I'm glad you are working on it and I'm certainly looking forward to be able to use SVG in my documentation/README :)
Relates to #270 , because SVGs can be Base64-encoded.
@kivikakk Sorry, just my monthly check for updates - how are things progressing? I'm still looking forward to putting SVG into my README.md
- what about simply sanitising all scripts from the SVG?
@ioquatix No need to say sorry! Good timing, as I'm literally working on a means of fixing this right this very second (!). I'm taking a whitelist-the-SVG-elements approach — the hardest bit is hooking this all into our raw file serving infrastructure, as it's extensively tuned and therefore not the easiest thing to integrate an XML sanitiser into. I'll be sure to update here when I've got something public facing going!
I've been looking into SVG files w.r.t security and would love to hear your input on this. If you consider 3 cases of issues with SVGs,
If you're not including the <svg>
tags directly in the generated html you should be OK here. In markdown you can say ![bad svg](https://myevilsite.com/badsvg.svg)
and this would typically render in HTML as <img src="https://myevilsite.com/badsvg.svg>
. Since you've specified the SVG file as the src of IMG tag, no script execution takes place. https://www.w3.org/wiki/SVG_Security and https://developer.mozilla.org/en-US/docs/Web/SVG/SVG_as_an_Image have more info on this.
In markdown you can say ![bad svg](https://myevilsite.com/badsvg.svg)
and this would typically render in HTML as <img src="https://myevilsite.com/badsvg.svg>
. Now I can have myevilsite.com
serve billion-laughs payload in the SVG that looks like this,
I've observed that this makes the tab hang in Chrome because they won't fixed https://bugs.chromium.org/p/chromium/issues/detail?id=617891 But this does not break Firefox/Edge/IE because they seem to have entity expansion limits and stop after a while. Your webpage will remain responsive. You can even see a console message in Edge/IE that looks like,
As commented by pdr @ https://bugs.chromium.org/p/chromium/issues/detail?id=617891, you can construct a well formed SVG that still does client side DOS like below,
(3) makes the tab crash on all browsers (tested on Chrome/Firefox/IE/Edge)
So as far as SVG script execution and XSS are concerned, you're relatively safe as long as you're rendering the SVG within an IMG tag. (2) can be gotten rid of by sanitizing the SVG and ensuring we do not support expansion of entities, like someone suggested we can even go the white-listing route and sanitize <!ENTITY>
elements within an SVG.
But what do we do about (3)?
@pkuma-msft wrote:
But what do we do about (3)?
Initial thoughts...
Require the SVG to be embedded, rather than simply linked. Then limit the number of elements served, displaying a warning instead if the number of elements is above a safe threshold.
Consider this to be not GitHub's problem and rely on browser vendors to fix it.
Until browser vendors start fixing this, option 1 is probably best :disappointed:
I believe it should be browser's job. Browser knows best how much resource is available locally.
It is unfortunate that Chrome is pushing us to option 1.
For the majority of use cases, using <img>
tag seems like a good way forward. It's not the responsibility of GitHub to fix browsers, and in any case, it's probably simply not possible.
@kivikakk, re https://github.com/github/markup/issues/556#issuecomment-289322045:
The biggest issue we need to overcome is security due to JS execution in SVG — allowing user-controlled scripts on a GitHub domain could be dangerous — so we're considering strategies to work in out.
As I pointed out the last time GitHub said this, two and a half years ago, there's no reason to complicate the code path, since <script>
tags are not executed in SVG images as included via <img>
, as can be seen in the numerous SVG images that GitHub Markdown already displays when URLs are not specified relative to the repository, such as this one right here:
Please, just... read this comment from that issue. Please. I'm losing my mind having to re-explain this basic point every few years, with nothing coming of it every separate time.
I'm going to specifically copy-paste the bottom-line takeaway from that comment to this thread (you can read the full explanation over at isaacs/github#316):
The solution should be to fix raw.githubusercontent.com to handle SVGs the same way as it handles other images (serving them with the appropriate content-type), or (if that's unfeasible) to have the Markdown render path rewrite local SVG references to rehost and point to camo.githubusercontent.com (as it currently does for remote images).
The solution should be to fix raw.githubusercontent.com to handle SVGs the same way as it handles other images (serving them with the appropriate content-type)
This is the correct solution, and the one that should be done.
@kivikakk perhaps you can give an update taking into consideration the above comments regarding inline SVG <img>
tags?
@ioquatix @stuartpb Thanks for your submissions!
Even if browsers deny execution via <img>
, there's still the problem that a user can simply get linked to an SVG file directly, and thus cause script execution on a GitHub-controlled domain. We don't want any code execution from uploaded content, understandably, and so this isn't a solution.
@kivikakk Isn't that already an issue since I can open https://camo.githubusercontent.com/50f5ad81a6aacaa37cdbd187c32ae493607843bb/68747470733a2f2f75706c6f61642e77696b696d656469612e6f72672f77696b6970656469612f636f6d6d6f6e732f662f66642f47686f73747363726970745f54696765722e737667
in a new browser window and it's the SVG document?
You can, but that's on the camo
host and subdomain. We don't serve GitHub user content through camo
. We're trying to prevent compromise of raw.githubusercontent.com
since that may have private data open (i.e. you can view a private repo's contents there) and we don't want script execution on that domain accordingly.
Okay, then how about plan B, having the Markdown render path rewrite local repo image references, instead of pointing them to raw.githubusercontent.com
, to point to the image on camo.githubusercontent.com
(following the same code path as is currently used to rehost remote images)? I imagine this'd also yield some degree of performance buff anyway as camo
, as I understand it, is better configured to serve static blob assets than raw
.
@stuartpb For reasons already covered (e.g. (a) "We don't serve GitHub user content through camo
", (b) camo
would have to fetch it from raw
anyway in the first place, so this is strictly a performance regression), this is not a solution. I'm already working on a definitive solution to this which gels with our wider infrastructure in my downtime between other tasks, as referenced in earlier comments.
@kivikakk, I appreciate your work on this, as, I'm sure, do others in the thread. Thanks for taking this on :)
Your two most recent comments initially left me confused. The first of these, in particular, appeared to contradict some other remarks. I now think I have found an interpretation of your comments that resolves these my confusion. I'll unpack that here, and I would be grateful if you could confirm whether my present understanding is correct. If so, then I hope my post will help others who were initially as puzzled as I was!
You wrote:
allowing user-controlled scripts on a GitHub domain could be dangerous
An unstated premise here is that GitHub, Inc., trusts its own client-side scripts enough to want them to run in visitors' browsers when visiting *.github.com
domains (at least, in those browsers where JavaScript is supported and enabled).
But GitHub, Inc., understandably, does not trust its users to the same extent, and therefore wants user-generated client-side scripts not to run under those same circumstances.
So far, this is all fair enough. (At least, it is if we put aside the case of *.github.io
. I think the latter is outside the scope of what you meant by "a GitHub domain" in your remark above.)
Subsequently, you implied that it is not a problem for users to run user-generated scripts "on the camo host and subdomain", even if embedded into content within github.com
such as @stuartpb's comment. This appears to contradict your remark quoted above.
In the same comment, you also said:
We don't serve GitHub user content through camo.
This seems to contradict a GitHub Help article, which says:
To host [user-uploaded] images, GitHub uses the open-source project Camo. Camo generates an anonymous URL proxy for each image that starts with
https://camo.githubusercontent.com/
and hides your browser details and related information from other users.
Again in the same comment, you said:
We're trying to prevent compromise of
raw.githubusercontent.com
since that may have private data open (i.e. you can view a private repo's contents there) and we don't want script execution on that domain accordingly.
This is the first time in this thread that the exposure of private repositories has been mentioned as a concern. Is this the danger you were referring to in the first comment above? If so, then that was not obvious to me, nor perhaps to other readers of the thread.
The soundness of your objection to @stuartpb's suggestion was not immediately clear.
I guess that what you meant above by "GitHub user content" is not "content provided by GitHub users" but rather, "content provided by GitHub users and hosted (not merely proxied) by GitHub within *.github.com
".
I also guess that when you said "allowing user-controlled scripts on a GitHub domain could be dangerous", rather than being especially concerned about malicious vectors like those @pkuma-msft highlighted, you were primarily concerned with a different and quite specific vector. This vector is that Mallory might upload a client-side script to raw.githubusercontent.com
that would, if Alice runs it, make Alice's browser siphon data from Alice's private GitHub account to Mallory's server. Same-origin policies would not protect Alice here because Alice's private data would also be being served by raw.githubusercontent.com
.
If so, this resolves apparent contradictions 1 and 2, and puzzlement 1.
Finally, I note that camo seems to be just a proxy, not a caching proxy. This observation, in combination with my guesses above (if correct) would justify your objection @stuartpb's suggestion on the basis of performance, thereby resolving puzzlement 2.
Are my guesses above correct, or am I off the mark?
Also, although I think I understand your concern about the performance implications of @stuartpb's suggestion, I suspect that many GitHub users would still prefer that suggestion to be implemented, than to remain unable to embed SVGs from their GitHub repositories into their READMEs/etc without relying upon third-party solutions like rawgit.com. Might it be an acceptable stopgap until the long-term fix that you are working gets completed and implemented?
Thanks again for your efforts :)
:wave: - @kivikakk and I had been chatting about this internally so I thought I'd take a quick stab at a reply since @kivikakk is likely sleeping :smile:
At least, it is if we put aside the case of *.github.io. I think the latter is outside the scope of what you meant by "a GitHub domain" in your remark above.
Yes, there is no attempt to control the content of GitHub Pages sites. The use of *.github.io
is explicitly for this reason.
I guess that what you meant above by "GitHub user content" is not "content provided by GitHub users"
That is more or less correct. In this discussion the primary consideration is private repository data. Anytime we are looking at a feature that can be used to access private repository data we obviously must prioritize keeping that information private above all else.
Same-origin policies would not protect Alice here because Alice's private data would also be being served by raw.githubusercontent.com.
Correct. While there are some small slightly non-obvious hurdles one would have to overcome, it is definitely possible and is the fundamental reason why most content on raw.githubuerscontent.com
is served as plaintext. We only whitelist a small subset of safe file formats that render natively on this domain.
This observation, in combination with my guesses above (if correct) would justify your objection @stuartpb's suggestion on the basis of performance, thereby resolving puzzlement 2.
Beyond performance (though performance is reason enough not to pursue it), the Camo solution likely has additional issues/hurdles. Camo was never designed to access private/authenticated content. It is an image proxy that was originally designed to proxy non-HTTPS URLs such that users did not get mixed content warnings when embedding images. We proxy additional things today, but the actual proxy functionality is more or less the same. All image URLs proxied by Camo are static (and hence cacheable). We take the originally provided URL, encode the original URL into the Camo URL, request it, proxy it, and serve it. This wouldn't work with private repository content since private repository content access either requires an authenticated session (your session cookie from github.com) or it requires an authentication token (the token
request parameter) like raw.githubusercontent.com
. Camo wouldn't have access to either of these since camo.githubusercontent.com
would have no access to a user's session and Camo URLs (as implemented today) are static and wouldn't have access to a dynamic token
request parameter to authenticate access . Moreover, even if things were rearchitected to pass a token
, you would run into the same issue as raw.githubusercontent.com
with regard to same-origin content and malicious scripts. And, unlike the external image content served by Camo today, it would now include private repository contents. In the end, this brings us back to the same issue as serving such content from raw.githubusercontent.com
.
Thanks again for your efforts :)
Agree... :bow: to @kivikakk for digging in and working on a solution. I think the solution @kivikakk is working on is ✨ and the best long-term solution.
P.S. To all of those on this thread, thanks so much for the patience for awaiting a solution that satisfies your needs and also satisfies our security requirements. I'm sure it can feel frustrating since it seems like a feature like this should be super simple to implement. But, as you can tell from the discussion in this thread, there is a bit more nuance to this than it might first appear. There is quite a bit of consideration that has gone into our architecture around serving various types of user content, and it can be a a bit tricky to succinctly convey all of that historical context in an issue.
Hi all,
Would you like to give this a go now, and let me know if you hit any bugs? I've just noticed one with DTDs which I'm working on now.
@kivikakk are <use>
elements purposely not rendered? Here's what I see when attempting to replace the image in mwclient's README with the SVG version:
For some reason, it doesn't appear to be working on the README here:
https://github.com/ioquatix/build-dependency
It just shows missing file?
I also tried here:
https://github.com/ioquatix/utopia/edit/master/README.md
There is a utopia.svg
in the same place as the PNG. If I simply change the file extension, it should work, but it doesn't seem to be.
For some reason, it doesn't appear to be working on the README here:
It looks like the sanitization process is mucking up the DOCTYPE bit. The unsanitized version is as follows:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<!-- Generated by graphviz version 2.40.1 (20161225.0304)
-->
<!-- Title: G Pages: 1 -->
<svg width="357pt" height="404pt"
...
The sanitized version is as follows:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="357pt" height="404pt" viewBox="0.00 0.00 357.00 404.00">
...
If you open the sanitized version directly it doesn't render in Chrome, with an error message related to line 2 (the DOCTYPE bit).
See also my original comment announcing this:
I've just noticed one with DTDs which I'm working on now.
This will be fixed shortly. :+1:
DOCTYPE fix is in. https://github.com/ioquatix/build-dependency looks better now.
Okay great it's working on both example repos above.
It didn't work with ?raw=true
@ioquatix I'm not sure what ?raw=true
is meant to do here. ?sanitize=true
on the raw endpoint will sanitize the SVG and render it with the appropriate Content-Type, and image references in Markdown are automatically updated to use that for SVGs.
@waldyrious this will be fixed once the cache rolls. :+1:
In the past, some of my embedded images used raw=true
but I'm not sure why any more.
@kivikakk Thanks for your effort sorting this out and your communication. I'm now happy this is working for my use case so I will close the issue. Awesome!!
So what's the conclusion? Can we now load SVG images on the markdown and readme file? Does it have some limitations?
@piranna Yes, SVG files in repositories can be included inline in Markdown files. They will be run through a sanitizer when viewed directly this way — tags like <script>
are removed.
@piranna Yes, SVG files in repositories can be included inline in Markdown files. They will be run through a sanitizer when viewed directly this way — tags like Githubissues.
Githubissues is a development platform for aggregating issues.
As discussed in https://github.com/isaacs/github/issues/316
Essentially embedding SVG images into a README.md file doesn't work but probably should. It's apparently broken due to the content type being incorrect for the SVG file being served.