Fix relative SVG rendering

ioquatix commented 8 years ago

As discussed in https://github.com/isaacs/github/issues/316

Essentially embedding SVG images into a README.md file doesn't work but probably should. It's apparently broken due to the content type being incorrect for the SVG file being served.

gustavomdsantos commented 8 years ago

+1

derekstavis commented 8 years ago

:+1:

FranklinYu commented 8 years ago

:+1:

klorenz commented 8 years ago

+1

ghost commented 8 years ago

:+1:

dangnelson commented 7 years ago

Wow, has this really been a bug for over a year?

FranklinYu commented 7 years ago

@dangnelson No, I don't think they even regard it as an issue. This is probably a "won't fix".

kivikakk commented 7 years ago

This isn't an issue with the Markup gem itself, but it certainly is a valid bug. Looking into a resolution now.

ioquatix commented 7 years ago

@kivikakk any update? Thanks!!

kivikakk commented 7 years ago

@ioquatix: we're discussing it currently! The biggest issue we need to overcome is security due to JS execution in SVG — allowing user-controlled scripts on a GitHub domain could be dangerous — so we're considering strategies to work in out. It might take a bit to get it fully ironed out, but I'm pushing this forward internally, so I'll let you know when there's any updates.

ioquatix commented 7 years ago

@kivikakk I thought that might be a problem. I'm glad you are thinking about it. The content is served from a raw.github.io right? What about using something like https://content-security-policy.com to protect your main domain?

ioquatix commented 7 years ago

@kivikakk Any update on this?

kivikakk commented 7 years ago

Unfortunately not yet; there's been a lot going on. We are still pushing forward with the issue (lots of moving parts), but slowly.

ioquatix commented 7 years ago

@kivikakk Perhaps this is a stupid comment/suggestion, and I understand what your are saying, but then how come SVG previews already work in this context? https://github.com/ioquatix/build-dependency/blob/master/partial.svg

kivikakk commented 7 years ago

@ioquatix when rendering a full blob, we use our internal "Render" system, which does sandboxed content rendering. Because it can take some time, there's also AJAX and a bunch of other things involved. (e.g. here's the actual path to the SVG renderer) This would look super ugly if used inline (all those AJAX spinners ...), and because it depends on an <iframe> for sandboxing, getting the width/height to propagate would be super awkward.

So we're looking into a solution that doesn't require using Render while still providing adequate security. (Currently thinking of sanitising the SVG ourselves, or a solution involving randomised subdomain names for each request to prevent cross-tab JS access.)

ioquatix commented 7 years ago

Oh, thanks for the awesome explanation.

That makes sense.

Well, I'm glad you are working on it and I'm certainly looking forward to be able to use SVG in my documentation/README :)

ghost commented 7 years ago

Relates to #270 , because SVGs can be Base64-encoded.

ioquatix commented 7 years ago

@kivikakk Sorry, just my monthly check for updates - how are things progressing? I'm still looking forward to putting SVG into my README.md - what about simply sanitising all scripts from the SVG?

kivikakk commented 7 years ago

@ioquatix No need to say sorry! Good timing, as I'm literally working on a means of fixing this right this very second (!). I'm taking a whitelist-the-SVG-elements approach — the hardest bit is hooking this all into our raw file serving infrastructure, as it's extensively tuned and therefore not the easiest thing to integrate an XML sanitiser into. I'll be sure to update here when I've got something public facing going!

pkuma-msft commented 7 years ago

I've been looking into SVG files w.r.t security and would love to hear your input on this. If you consider 3 cases of issues with SVGs,

(1) SVGs containing javascript code

If you're not including the <svg> tags directly in the generated html you should be OK here. In markdown you can say ![bad svg](https://myevilsite.com/badsvg.svg) and this would typically render in HTML as <img src="https://myevilsite.com/badsvg.svg>. Since you've specified the SVG file as the src of IMG tag, no script execution takes place. https://www.w3.org/wiki/SVG_Security and https://developer.mozilla.org/en-US/docs/Web/SVG/SVG_as_an_Image have more info on this.

(2) SVGs containing billion laughs payload - Client side DOS attack

In markdown you can say ![bad svg](https://myevilsite.com/badsvg.svg) and this would typically render in HTML as <img src="https://myevilsite.com/badsvg.svg>. Now I can have myevilsite.com serve billion-laughs payload in the SVG that looks like this,

I've observed that this makes the tab hang in Chrome because they won't fixed https://bugs.chromium.org/p/chromium/issues/detail?id=617891 But this does not break Firefox/Edge/IE because they seem to have entity expansion limits and stop after a while. Your webpage will remain responsive. You can even see a console message in Edge/IE that looks like,

(3) Well formed SVGs which make your browser run out of resources (a slight variation of 2)

As commented by pdr @ https://bugs.chromium.org/p/chromium/issues/detail?id=617891, you can construct a well formed SVG that still does client side DOS like below,

(3) makes the tab crash on all browsers (tested on Chrome/Firefox/IE/Edge)

So as far as SVG script execution and XSS are concerned, you're relatively safe as long as you're rendering the SVG within an IMG tag. (2) can be gotten rid of by sanitizing the SVG and ensuring we do not support expansion of entities, like someone suggested we can even go the white-listing route and sanitize <!ENTITY> elements within an SVG.

But what do we do about (3)?

ghost commented 7 years ago

@pkuma-msft wrote:

But what do we do about (3)?

Initial thoughts...

Option 1

Require the SVG to be embedded, rather than simply linked. Then limit the number of elements served, displaying a warning instead if the number of elements is above a safe threshold.

Option 2

Consider this to be not GitHub's problem and rely on browser vendors to fix it.

Conclusion

Until browser vendors start fixing this, option 1 is probably best :disappointed:

FranklinYu commented 7 years ago

I believe it should be browser's job. Browser knows best how much resource is available locally.

It is unfortunate that Chrome is pushing us to option 1.

ioquatix commented 7 years ago

For the majority of use cases, using <img> tag seems like a good way forward. It's not the responsibility of GitHub to fix browsers, and in any case, it's probably simply not possible.

stuartpb commented 7 years ago

@kivikakk, re https://github.com/github/markup/issues/556#issuecomment-289322045:

The biggest issue we need to overcome is security due to JS execution in SVG — allowing user-controlled scripts on a GitHub domain could be dangerous — so we're considering strategies to work in out.

As I pointed out the last time GitHub said this, two and a half years ago, there's no reason to complicate the code path, since <script> tags are not executed in SVG images as included via <img>, as can be seen in the numerous SVG images that GitHub Markdown already displays when URLs are not specified relative to the repository, such as this one right here:

this will be included as an img

Please, just... read this comment from that issue. Please. I'm losing my mind having to re-explain this basic point every few years, with nothing coming of it every separate time.

stuartpb commented 7 years ago

I'm going to specifically copy-paste the bottom-line takeaway from that comment to this thread (you can read the full explanation over at isaacs/github#316):

The solution should be to fix raw.githubusercontent.com to handle SVGs the same way as it handles other images (serving them with the appropriate content-type), or (if that's unfeasible) to have the Markdown render path rewrite local SVG references to rehost and point to camo.githubusercontent.com (as it currently does for remote images).

piranna commented 7 years ago

The solution should be to fix raw.githubusercontent.com to handle SVGs the same way as it handles other images (serving them with the appropriate content-type)

This is the correct solution, and the one that should be done.

ioquatix commented 7 years ago

@kivikakk perhaps you can give an update taking into consideration the above comments regarding inline SVG <img> tags?

kivikakk commented 7 years ago

@ioquatix @stuartpb Thanks for your submissions!

Even if browsers deny execution via <img>, there's still the problem that a user can simply get linked to an SVG file directly, and thus cause script execution on a GitHub-controlled domain. We don't want any code execution from uploaded content, understandably, and so this isn't a solution.

ioquatix commented 7 years ago

@kivikakk Isn't that already an issue since I can open https://camo.githubusercontent.com/50f5ad81a6aacaa37cdbd187c32ae493607843bb/68747470733a2f2f75706c6f61642e77696b696d656469612e6f72672f77696b6970656469612f636f6d6d6f6e732f662f66642f47686f73747363726970745f54696765722e737667 in a new browser window and it's the SVG document?

kivikakk commented 7 years ago

You can, but that's on the camo host and subdomain. We don't serve GitHub user content through camo. We're trying to prevent compromise of raw.githubusercontent.com since that may have private data open (i.e. you can view a private repo's contents there) and we don't want script execution on that domain accordingly.

stuartpb commented 7 years ago

Okay, then how about plan B, having the Markdown render path rewrite local repo image references, instead of pointing them to raw.githubusercontent.com, to point to the image on camo.githubusercontent.com (following the same code path as is currently used to rehost remote images)? I imagine this'd also yield some degree of performance buff anyway as camo, as I understand it, is better configured to serve static blob assets than raw.

kivikakk commented 7 years ago

@stuartpb For reasons already covered (e.g. (a) "We don't serve GitHub user content through camo", (b) camo would have to fetch it from raw anyway in the first place, so this is strictly a performance regression), this is not a solution. I'm already working on a definitive solution to this which gels with our wider infrastructure in my downtime between other tasks, as referenced in earlier comments.

ghost commented 7 years ago

@kivikakk, I appreciate your work on this, as, I'm sure, do others in the thread. Thanks for taking this on :)

Your two most recent comments initially left me confused. The first of these, in particular, appeared to contradict some other remarks. I now think I have found an interpretation of your comments that resolves these my confusion. I'll unpack that here, and I would be grateful if you could confirm whether my present understanding is correct. If so, then I hope my post will help others who were initially as puzzled as I was!

Source of confusion

Initial premise

You wrote:

allowing user-controlled scripts on a GitHub domain could be dangerous

An unstated premise here is that GitHub, Inc., trusts its own client-side scripts enough to want them to run in visitors' browsers when visiting *.github.com domains (at least, in those browsers where JavaScript is supported and enabled).

But GitHub, Inc., understandably, does not trust its users to the same extent, and therefore wants user-generated client-side scripts not to run under those same circumstances.

So far, this is all fair enough. (At least, it is if we put aside the case of *.github.io. I think the latter is outside the scope of what you meant by "a GitHub domain" in your remark above.)

Apparent contradiction 1

Subsequently, you implied that it is not a problem for users to run user-generated scripts "on the camo host and subdomain", even if embedded into content within github.com such as @stuartpb's comment. This appears to contradict your remark quoted above.

Apparent contradiction 2

In the same comment, you also said:

We don't serve GitHub user content through camo.

This seems to contradict a GitHub Help article, which says:

To host [user-uploaded] images, GitHub uses the open-source project Camo. Camo generates an anonymous URL proxy for each image that starts with https://camo.githubusercontent.com/ and hides your browser details and related information from other users.

Puzzlement 1

Again in the same comment, you said:

We're trying to prevent compromise of raw.githubusercontent.com since that may have private data open (i.e. you can view a private repo's contents there) and we don't want script execution on that domain accordingly.

This is the first time in this thread that the exposure of private repositories has been mentioned as a concern. Is this the danger you were referring to in the first comment above? If so, then that was not obvious to me, nor perhaps to other readers of the thread.

Puzzlement 2

The soundness of your objection to @stuartpb's suggestion was not immediately clear.

Resolving the confusion

I guess that what you meant above by "GitHub user content" is not "content provided by GitHub users" but rather, "content provided by GitHub users and hosted (not merely proxied) by GitHub within *.github.com".

I also guess that when you said "allowing user-controlled scripts on a GitHub domain could be dangerous", rather than being especially concerned about malicious vectors like those @pkuma-msft highlighted, you were primarily concerned with a different and quite specific vector. This vector is that Mallory might upload a client-side script to raw.githubusercontent.com that would, if Alice runs it, make Alice's browser siphon data from Alice's private GitHub account to Mallory's server. Same-origin policies would not protect Alice here because Alice's private data would also be being served by raw.githubusercontent.com.

If so, this resolves apparent contradictions 1 and 2, and puzzlement 1.

Finally, I note that camo seems to be just a proxy, not a caching proxy. This observation, in combination with my guesses above (if correct) would justify your objection @stuartpb's suggestion on the basis of performance, thereby resolving puzzlement 2.

Remarks

Are my guesses above correct, or am I off the mark?

Also, although I think I understand your concern about the performance implications of @stuartpb's suggestion, I suspect that many GitHub users would still prefer that suggestion to be implemented, than to remain unable to embed SVGs from their GitHub repositories into their READMEs/etc without relying upon third-party solutions like rawgit.com. Might it be an acceptable stopgap until the long-term fix that you are working gets completed and implemented?

Thanks again for your efforts :)

ptoomey3 commented 7 years ago

:wave: - @kivikakk and I had been chatting about this internally so I thought I'd take a quick stab at a reply since @kivikakk is likely sleeping :smile:

At least, it is if we put aside the case of *.github.io. I think the latter is outside the scope of what you meant by "a GitHub domain" in your remark above.

Yes, there is no attempt to control the content of GitHub Pages sites. The use of *.github.io is explicitly for this reason.

I guess that what you meant above by "GitHub user content" is not "content provided by GitHub users"

That is more or less correct. In this discussion the primary consideration is private repository data. Anytime we are looking at a feature that can be used to access private repository data we obviously must prioritize keeping that information private above all else.

Same-origin policies would not protect Alice here because Alice's private data would also be being served by raw.githubusercontent.com.

Correct. While there are some small slightly non-obvious hurdles one would have to overcome, it is definitely possible and is the fundamental reason why most content on raw.githubuerscontent.com is served as plaintext. We only whitelist a small subset of safe file formats that render natively on this domain.

This observation, in combination with my guesses above (if correct) would justify your objection @stuartpb's suggestion on the basis of performance, thereby resolving puzzlement 2.

Beyond performance (though performance is reason enough not to pursue it), the Camo solution likely has additional issues/hurdles. Camo was never designed to access private/authenticated content. It is an image proxy that was originally designed to proxy non-HTTPS URLs such that users did not get mixed content warnings when embedding images. We proxy additional things today, but the actual proxy functionality is more or less the same. All image URLs proxied by Camo are static (and hence cacheable). We take the originally provided URL, encode the original URL into the Camo URL, request it, proxy it, and serve it. This wouldn't work with private repository content since private repository content access either requires an authenticated session (your session cookie from github.com) or it requires an authentication token (the token request parameter) like raw.githubusercontent.com. Camo wouldn't have access to either of these since camo.githubusercontent.com would have no access to a user's session and Camo URLs (as implemented today) are static and wouldn't have access to a dynamic token request parameter to authenticate access . Moreover, even if things were rearchitected to pass a token, you would run into the same issue as raw.githubusercontent.com with regard to same-origin content and malicious scripts. And, unlike the external image content served by Camo today, it would now include private repository contents. In the end, this brings us back to the same issue as serving such content from raw.githubusercontent.com.

Thanks again for your efforts :)

Agree... :bow: to @kivikakk for digging in and working on a solution. I think the solution @kivikakk is working on is ✨ and the best long-term solution.

P.S. To all of those on this thread, thanks so much for the patience for awaiting a solution that satisfies your needs and also satisfies our security requirements. I'm sure it can feel frustrating since it seems like a feature like this should be super simple to implement. But, as you can tell from the discussion in this thread, there is a bit more nuance to this than it might first appear. There is quite a bit of consideration that has gone into our architecture around serving various types of user content, and it can be a a bit tricky to succinctly convey all of that historical context in an issue.

kivikakk commented 7 years ago

Hi all,

Would you like to give this a go now, and let me know if you hit any bugs? I've just noticed one with DTDs which I'm working on now.

waldyrious commented 7 years ago

@kivikakk are <use> elements purposely not rendered? Here's what I see when attempting to replace the image in mwclient's README with the SVG version:

screenshot from 2017-08-31 07-09-33

ioquatix commented 7 years ago

For some reason, it doesn't appear to be working on the README here:

https://github.com/ioquatix/build-dependency

It just shows missing file?

ioquatix commented 7 years ago

I also tried here:

https://github.com/ioquatix/utopia/edit/master/README.md

There is a utopia.svg in the same place as the PNG. If I simply change the file extension, it should work, but it doesn't seem to be.

ptoomey3 commented 7 years ago

For some reason, it doesn't appear to be working on the README here:

It looks like the sanitization process is mucking up the DOCTYPE bit. The unsanitized version is as follows:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
 "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<!-- Generated by graphviz version 2.40.1 (20161225.0304)
 -->
<!-- Title: G Pages: 1 -->
<svg width="357pt" height="404pt"
...

The sanitized version is as follows:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
 "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd"&gt;

<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="357pt" height="404pt" viewBox="0.00 0.00 357.00 404.00">
...

If you open the sanitized version directly it doesn't render in Chrome, with an error message related to line 2 (the DOCTYPE bit).

kivikakk commented 7 years ago

See also my original comment announcing this:

I've just noticed one with DTDs which I'm working on now.

This will be fixed shortly. :+1:

kivikakk commented 7 years ago

DOCTYPE fix is in. https://github.com/ioquatix/build-dependency looks better now.

ioquatix commented 7 years ago

Okay great it's working on both example repos above.

It didn't work with ?raw=true

kivikakk commented 7 years ago

@ioquatix I'm not sure what ?raw=true is meant to do here. ?sanitize=true on the raw endpoint will sanitize the SVG and render it with the appropriate Content-Type, and image references in Markdown are automatically updated to use that for SVGs.

kivikakk commented 7 years ago

@waldyrious this will be fixed once the cache rolls. :+1:

ioquatix commented 7 years ago

In the past, some of my embedded images used raw=true but I'm not sure why any more.

ioquatix commented 7 years ago

@kivikakk Thanks for your effort sorting this out and your communication. I'm now happy this is working for my use case so I will close the issue. Awesome!!

piranna commented 7 years ago

So what's the conclusion? Can we now load SVG images on the markdown and readme file? Does it have some limitations?

kivikakk commented 7 years ago

@piranna Yes, SVG files in repositories can be included inline in Markdown files. They will be run through a sanitizer when viewed directly this way — tags like <script> are removed.

piranna commented 7 years ago

@piranna Yes, SVG files in repositories can be included inline in Markdown files. They will be run through a sanitizer when viewed directly this way — tags like Githubissues.
Githubissues is a development platform for aggregating issues.

github / markup