dauwhe / epub31-bff

Straw-man spec for browser-friendly format for EPUB31
15 stars 3 forks source link

How does encryption.xml etc work in BFF? #11

Open dauwhe opened 8 years ago

dauwhe commented 8 years ago

Mailing list thread:

https://groups.google.com/forum/#!searchin/epub-working-group/Encryption$20in$20BFF$20land/epub-working-group/tU1c0e5z54k/DpGyAEgmBwAJ

dauwhe commented 8 years ago

Garth Conboy wrote (in the original thread):

Hi Folks,

This topic may already have been discussed, if so, just point me at the thread.

But, I've been doing a little thinking about content encryption in BFF land. Does the function of "encryption.xml" continue to play a role? Regardless, how is it round-tripped?

Perhaps we don't care too much, as encrypted content might be expected to be behind some sort a paywall and thus in the clear "back there?" Obfuscated fonts likely replaced with WOFF versions?

But, lets say we have classic EPUB that has encrypted content and/or obfuscated fonts -- how does that play when BFF-ized, and how does it get back when classic-ized again?

If we're assuming we're in the JSON world (for the OPF) [rather than HTML] when we've been BFF-ized, does encryption.xml also get JSON-ed (and either included in the main JSON or referenced from it)? Is it left in XML and referenced from the JSON OPF?

Am I completely missing something? Am I digging a rathole that should not be dug? Or was there discussion of this early on, and it's already been settled?

Thanks much!

Best, Garth

dauwhe commented 8 years ago

Leonard Rosenthal wrote (in the original thread):

I’ll leave encryption alone for the moment but instead tackle the font issue.

Vendors/Foundries that allow their fonts to be embedded into an EPUB in obfuscated fashion may not allow them to be served dynamically in an “unpacked form”. The same is also true in reverse that vendors/foundries that offer “Web Fonts” do not usually allow those same fonts to be embedded into packages.

So while you don’t necessarily have a technical concern here – the USERS of the technology will need to be aware of the significant legal concerns.

Leonard

dauwhe commented 8 years ago

Hadrien Gardeur wrote (in the original thread):

Ah. The big question that I was waiting for.

I'm not going to tackle the font issue, for which the legal part is by far the biggest problem by far.

I really don't think that DRM and encrypted content can work the same way in BFF-land. Instead of a license/encryption centric view, this should be replaced by a standardized mechanism for authentication in BFF-land.

While we could technically use encryption.xml (just link to it and/or do a JSON variant), I just don't think that this makes any sense given how easily you can eventually just get an unencrypted version of any resource out of your browser. I know that several big publishers already have their own robustness rules for such means of accessing content, and they tend to differ a lot from the usual DRM requirements.

It would definitely be a good idea to eventually standardize authentication for such publications, and provide a way to reach behind a paywall from various standard RS, but this is a different project with its own set of issues.

dauwhe commented 8 years ago

Bill McCoy wrote (in the original thread):

I don't see that it's in scope to address legal issues regarding fonts or any other IP (images and other resources in a publication may be separately and potentially restrictively licensed only for use in that publication, font foundries have no monopoly on this use case). There is no difference between an ordinary web page or web app and a BFF exploded publication so the legal issues are identical. And arguably, there is no fundamental difference between a font stored in a ZIP archive EPUB file or a PDF file that's on an open URL from an individual font so stored. Fonts can easily be extracted from PDF documents by users via a variety of software means and even online services such as [1]. And we added support WOFF as well as OpenType as a core media type [2] in part to address the requests of font vendors and of course also have font obfuscation (which makes EPUB embedded fonts slightly more secure than PDF embedded fonts which are not internally "mangled").

And, as Hadrien indicated, the real answer to content security in an online scenario is authentication not content encryption. Defining an interoperable BFF for EPUB doesn't imply that unpackaged publications will necessarily be published on open URLs, they will, in most scenarios for commercial content, be served up from servers to authenticated clients using https.

However, that doesn't mean to me that we're off the hook re: round-trippability of encryption.xml and friends (Garth's original question). We have font obfuscation, and while it's perhaps unlikely that it will be valid to have other encrypted content I think for clean separation of concerns we shouldn't rule that out in developing a BFF serialization. For example it may be that in the near future we have a very simple type of password-protected EPUB that is analogous to simple password-protected PDF. This could be perfectly valid to serve up from a website whether exploded or as a single EPUB archive.

And on general principles it doesn't make sense to relax the round-trippability requirement on a case by case basis. Among other things it will just take too much time to debate each and every potential exception, and it's going to be much much simpler if our result is a complete isomorphism between the packaged and unpackaged serializations.

--Bill

[1] http://www.pdfconvertonline.com/extract-pdf-fonts-online.html

[2] http://www.idpf.org/epub/301/spec/epub-publications.html#sec-core-media-types

dauwhe commented 8 years ago

Garth Conboy wrote (in the original thread):

I can largely agree with that.

But, we'll clearly need to handle this case (in some minimal fashion) for round-trip-ability. And maybe just carrying around the XML version of encryption.xml would be sufficient.

Best, Garth

dauwhe commented 8 years ago

Leonard Rosenthal wrote (in the original thread):

I don't see that it's in scope to address legal issues regarding fonts or any other IP

I don’t completely agree. I think you need to give guidance (not necessarily in the standard document, but in some document).

But it’s strange that you write the above AND THEN go into your (incorrect) opinion on font licensing issues?!?! I’ve already stated the correct information in a separate email, so I won’t repeat the details here.

And on general principles it doesn't make sense to relax the round-trippability requirement on a case by case basis it 's going to be much much simpler if our result is a complete isomorphism between the packaged and unpackaged serializations

Much simpler for the user/publisher, maybe. For those of us developing and then implementing the specifications – exactly the opposite. Developing, for example, an encryption model that is round-trippable between the two models is significantly more complex than developing one that only needs to work in a single state.

Leonard

dauwhe commented 8 years ago

Vladimir Levantovsky wrote (in the original thread):

I know this is not the first time we attempted to discuss “the font issue” (which I personally do not consider an issue) and we agreed to disagree on this … but I want to make sure that the alternative point of view is at least presented for consideration.

I don’t believe there is the font issue that can or even need to be resolved by encryption at this time. What font vendors do or do not allow (and for the avoidance of any doubt, I am speaking on behalf of one) solely depends on the license agreement between a vendor and a publisher. Yes, a typical font license purchased online for personal use (i.e. a font that happen to reside in someone’s Fonts folder) doesn’t allow producing commercial content or using the font in question as a web font but it doesn’t mean that the font vendor will never allow such use – anyone who need to use fonts for commercial publications (both online and offline) simply need to purchase a license that provides certain rights the publishers need to accomplish what they need/want to accomplish. The conditions of that license (and the price paid for a particular font license) will vary widely based on many factors that are negotiable between two interested parties (the user and the vendor). I agree that the users of the technology will need to be aware of certain IP licensing issues but it is true for any IP, not just fonts.

I also agree that we don’t have a technical concern here, hence I don’t see a need for the technical specification to address this until at least such time when the technical concern is identified.

Thank you,

Vladimir

dauwhe commented 8 years ago

Bill McCoy wrote (in the original thread):

Leonard, on what basis do you argue that we need to give additional guidance re font usage?

Re: spec and implementation difficulty... If we took the dead simplest BFF solution, which ReadiumJS already supports, that is essentially just an unzipped packaged EPUB, I don't see why everything doesn't just work. At least, obfuscated fonts still work in ReadiumJS with exploded content so that's an existence proof of implementability. And there's nothing simpler spec-wise than to do nothing (not try to recast encryption.XML et. al. in JSON). I'm not suggesting this would magically make proprietary DRM schemes work in a browser based implementation, but it is not in scope for EPUB 3.1 to define an interoperable DRM at all, whether an instantiation of a publication is BFF or packaged.

Bill.

dauwhe commented 8 years ago

Hadrien Gardeur wrote (in the origin thread):

Regarding encryption.xml itself, we can:

  1. do absolutely nothing about it
  2. link to it in BFF and keep it as-is
  3. link to it and do a JSON serialization of it

Since our encryption.xml is based on specifications defined by W3C, I don't think that option 3 is a possibility. Depending on our need to support encryption (for example for the sake of roundtripability) we could either go with 1 and 2.

A likely outcome of this discussion IMO is that we'll go with 2 just to make it easier to round-trip, but RS will simply ignore that link in BFF.

Authentication is a different discussion, but it's also a subject that we've tried to address for OPDS. There's a first draft for something we've called "Authentication for OPDS"[1] until now:

It would be very easy to support a "password based" protection by using that specification along with BFF, or potentially introduce additional authentication flows for it. Such authentication could easily provide the exact same UX as something like LCP, or simply redirect to the content provider's website to authenticate the user.

It is already used by Feedbooks/Aldiko and by NYPL for Library Simplified, in both cases for OPDS (borrowing/buying or access to an online bookshelf) and I'd be happy to consider moving that work to the IDPF if there's any interest about it.

Hadrien

[1] https://docs.google.com/document/d/1-_0HHt664bDjybtCauBJXUSDXiT-Clg1sZUVNxHyLjw/

dauwhe commented 8 years ago

Leonard Rosenthal wrote (in the original thread):

Because without guidance (which can be kept updated, as I do agree with Vlad that the landscape is changing), people who aren’t aware of the issues can make mistakes. Better to put out something so that at least you can say you tried…

My point about the implementation wasn’t about the fonts – but about the encryption. And we agree that nothing would magically make that work in BFF. However, if the goal is to have a “ complete isomorphism between the packaged and unpackaged serializations” (your words), then that also has to apply to solutions that use DRM/encryption. Yes?

Leonard

dauwhe commented 8 years ago

Leonard Rosenthal wrote (in the original thread):

I would think that any work on authentication would align with the new W3C work in this area. From their most recent update…

W3C Accelerates Efforts to Build a More Secure Web with Launch of Web Authentication Working Group

17 February 2016 https://www.w3.org/blog/news/archives/5295

W3C Announced today the launch of the Web Authentication Working Group whose goal is to develop standards using strong cryptographic operations in place of password exchange. This approach offers a more secure and flexible alternative to password-based log-ins on the Web, often seen as being annoying to use and offering weak protection.

“When strong authentication is easy to deploy, we make the Web safer for daily use, personal and commercial,” said Sir Tim Berners-Lee, Web Inventor and W3C Director. “With the scope and frequency of attacks increasing, it is imperative for W3C to develop new standards and best practices for increased security on the Web.”

The W3C’s Web Authentication technical work is being accelerated thanks to a W3C member submission of FIDO 2.0 Web APIs from members of the FIDO Alliance. The submitted APIs are intended to ensure standards-based strong authentication across all Web browsers and related Web platform infrastructure.

The new Web Authentication Working Group’s first meeting will take place 4 March 2016 in San Francisco, conveniently timed for people who are also attending the RSA USA Conference. For more information about the Web Authentication Working Group, see the press release.

https://www.w3.org/2016/02/securewebauthwg.html.en

dauwhe commented 8 years ago

Bill McCoy wrote (in the previous thread):

Hi Leonard, re: "Vendors/Foundries that allow their fonts to be embedded into an EPUB in obfuscated fashion may not allow them to be served dynamically in an 'unpacked form'". This just seems like a hypothetical, and one that could equally be the case for stock photos or other separately licensed assets. Our job is to define the format not the licensing terms between content creators and their supply chain. Again given the ease of extraction of fonts from today's PDF - 5 seconds from a google search to locating an online service that does the extraction - it's not like embedding in any way lowers the risk of font piracy. And WOFF was designed to address this situation for online Web content in general.

I do think that an explanatory note that indicates that a license to use a publication does not always convey rights to use content (including but not limited to fonts) that are elements of that publication would be fine. I just don't accept that there's any strong reason to make this note different for the packaged vs. unpackaged case (even a statement that unpackaged content SHOULD use WOFF rather than OpenType would seem overly strong since whether that's the case again depends on licensing terms on a case by case basis).

Re: encryption, to me what is fundamental is isomorphism of the data so as to enable lossless conversion between packaged/unpackaged states, which may be done by various agents for various purposes. Simply maintaining encryption.xml and friends in BFF will give us that. Functionality provided by implementations certainly may not be equivalent; again, the cogent example is likely non-support in browser-based implementations of today's prevalent proprietary DRM for EPUB (whether the content is packaged or unpackaged). But we can't do anything about that nor do I think it's really our job to do so in defining an unpackaged serialization.

--Bill

dauwhe commented 8 years ago

Leonard Rosenthal wrote (in the original thread):

Not hypothetical at all. A number of foundries don’t allow “desktop fonts” to be served online. And many web font vendors don’t allow them to be embedded. Here are some URLs that were easily found by Google:

http://www.fontspring.com/support/general/difference-between-desktop-font-and-webfont https://www.myfonts.com/licensing/desktop/ http://help.typekit.com/customer/portal/articles/1341590 http://www.monotype.com/fonts/licensing/ http://www.fonts.com/info/services/licensing-options

I agree that this is NOT a technical issue. I am NOT suggesting that we do anything in the specification/file format to address it. I am simply suggesting that since even someone such as yourself was unaware of the licensing situation – how much more so would others. We should consider doing a “white paper” on the topic – that’s all.

(NOTE: extraction of a font from a PDF and installing it on the local system is also a violation of most font foundry licenses…)

If the EPUB is not functionally equivalent when packaged vs. unpackaged – then as far as the publisher (and/or user) is concerned, it’s NOT lossless. Lossless MUST be imply not only the data but the experience. But then I would ask this question of your publisher members and see where they fall on the subject.

Leonard

dauwhe commented 8 years ago

Bill McCoy wrote (in the original thread):

Hi Leonard, of course not all fonts are legally embeddable, and not all are allowable on web pages either. But your statement about EPUB still seems hypothetical: none of your links specifically state that a packaged EPUB is OK but that an unpackaged flavor of EPUB (that we haven't even defined yet!) is not (particularly if such unpackage flavor was not served up over the open Web but only in an authenticated situation). Part of this stems from the fact that much of this font licensing stuff is anachronistic, dating from the days when "desktop fonts" were expected to be used for "desktop publishing" whose output was print. That is no obviously no longer the case and for that matter the content creation platform isn't just "desktops' either. PDF fudged this in part by, in early days, relying on font fauxing rather than font embedding. Anyway I stand by my point which is that there is no fundamental difference between a trivially extractable font subset contained in a PDF file, in an EPUB file, or an exploded EPUB publication and creators of any of these flavors of electronic documents will have to verify that they have the appropriate rights to assets used therein (including the possibility that rights to embed such assets into a single-file representation may not be applicable to an exploded representation even if that is silly).

Re: encryption, I think you are conflating two different things; browser-based EPUB support and an unpackaged serialization of EPUB. Browser-based implementations of EPUB already exist and they do not support DRM, among other differences from native-app EPUB implementations (which of course still utilize browser engines since EPUB 3 is based on HTML5, but bolt on other additional capabilities via native code). To me this has nothing to do with whether or not the EPUB content is packaged in a single file, it has to do with limitations of having 100% of the EPUB processing being implemented in JS running in the in browser. The unpackaged serialization of EPUB will be more "browser-friendly", hence the "BFF" moniker, but an unpackaged serialization will also help in other use cases that have nothing to do with a pure JS client-side implementation, such as distributed content and even optimizing the rendering of content from native apps. Eventually if & as browsers natively support EPUB the distinction between "browser-based" and "native app" EPUB rendering may become moot (for example a browser implementation could support DRM for EPUB, as browsers already do for video with Encrypted Media Extensions). In hindsight rather than "Browser-Friendly Format' perhaps a better term would have been "Web-Friendly Format" (but BFF is a much cuter acronym).

--Bill

dauwhe commented 8 years ago

Vladimir Levantovsky wrote (in the original thread):

Leonard,

It shouldn’t come as a surprise to anyone that licenses restrict the usage of licensed products. A DVD movie purchased for $20 gives you rights to enjoy it at home with your family and friends and watch it as many times as you like, you can even share it with you colleagues and let them watch it for free but you can’t make a copy and give it to someone and you can’t setup lawn chairs and a projection screen in your backyard and start selling tickets to show the movie – your license doesn’t allow making copies and selling tickets, the cinemas pay a lot more for a license to show the same movie “for profit”.

Fonts (or any other licensed IP) are no different as I am sure you are well-aware of it – the “desktop fonts” are, by definition, licensed for desktop use and can be embedded in your own documents (docs, PDF, PPT, etc.) provided that you’re not selling them for profit. I.e., you can share your fonts with anyone by embedding them in a document you created so that anyone can see it “as is” but you can’t make copies of the fonts and give them away to be installed on someone else’s desktop or you cannot use the desktop fonts to create content with embedded fonts that you sell for profit. If you do, you just need a different license for the same font to use it for commercial purposes (even though how you use them, what you do and the embedding mechanism may be exactly the same). Similarly, you just need a different license to use the same fonts on the web, or sell them as resident fonts installed in a e-book reader, … the list of different licensed uses may go on. Quoting from the Monotype licensing page you linked to in your email, “Monotype offers a variety of extended use licenses to meet the needs of its customers who want to use the fonts beyond the terms granted in the standard workstation end user license agreement (EULA).” The fact that standard EULAs don’t allow every conceivable use shouldn’t be a surprise to anyone - isn’t it all just a part of “Licensing 101” course?

I am not sure why you believe this needs to be spelled out in a technical specification. If anything, a single sentence that says that “font vendors offer a variety of extended use licenses that go beyond the standard terms of the desktop EULA” would do the job but even that seems to be redundant, IMO.

Thank you,

Vlad