HTTPArchive / almanac.httparchive.org

HTTP Archive's annual "State of the Web" report made by the web community
https://almanac.httparchive.org
Apache License 2.0
611 stars 169 forks source link

Generate an ebook #37

Closed rviscomi closed 4 years ago

rviscomi commented 5 years ago

@HTTPArchive/developers curious to hear thoughts from others about this, might be crazy.

I'd like to see the entire contents of the Almanac on a single web page, formatted similar to a book. It would also have a print stylesheet to handle things like page breaks and page numbers, so one could print to PDF and it'd just work™️ as a fully formed e-book. It'd also be a PWA that could be added to home screen and read offline.

There are concerns like lazy loading, history state management, deep linking, etc but I think these are all solvable problems.

I'm excited about this idea because a report on the state of the web should ideally maximize the web's capabilities for a great UX.

WDYT?

Requirements (edit by @mikegeyser):

Structure:

Rendering:

Tooling:

Please feel free to add any more, and we can see if they're ~possible~ feasible. :)

anoblet commented 5 years ago

Do you see it as being horizontally navigable, or vertical? Swapping documents instead of scrolling would help with lazy-loading, state-management, unique routes etc. Scrolling could be emulated using animations.

anoblet commented 5 years ago

E-book + lazy-loading is going to take some skill.

matijagrcic commented 5 years ago

Would it be similar to the https://tympanus.net/Development/FlipboardPageLayout/ (src at: https://tympanus.net/codrops/2012/05/07/experimental-page-layout-inspired-by-flipboard/). It's an old example i had bookmarked way back when researching something but concepts should still apply.

A newer example leveraging the CSS Grid can be found at https://tympanus.net/Development/PageFlipLayout/ (src at: https://tympanus.net/codrops/2018/11/12/page-flip-layout/)

Lazy loading can be done using the IntersectionObserver when a page or a piece of content comes into view, on any particular user interaction we choose, we can also get couple of pages eagerly (can decide if all their content is needed or some).

Verifying all is OK from SEO side of things https://developers.google.com/search/docs/guides/lazy-loading

matijagrcic commented 5 years ago

Presenting the Quick page view can be done similar to the https://tympanus.net/Development/GridLayoutScrollableContent/ (src at: https://tympanus.net/codrops/2018/09/19/grid-layout-scrollable-content-view/).

Kinda like this approach as a mix with the above.

rviscomi commented 5 years ago

This was inspired by working backwards from the goal of being able to Ctrl+P some page and be able to get a print version of the entire Almanac (save as PDF or whatever). It's not a mission critical use case but something I thought would be a nice touch.

Thinking more about this, we should focus on the straightforward task of serving each chapter as a standalone page. Then as a bonus if there's time (or even after launch) we could think about combining all of the contents into one document.

My main hesitation with making this the primary experience is the technical complexity in order to provide a great UX. And I'm not sure the amount of work is worth it compared to building a more traditional document structure.

I'll close this issue for now and we can revisit it later in the project depending on resources.

mikegeyser commented 4 years ago

So, considering that the content is all static (more or less) and we already generate html for the jinja templates, we can generate a single 'print friendly' view that can be a single index.html page with minimal styling. While not trivial, it would be pretty simple to do.

Making the almanac an offline-eable PWA will just mean having a service worker with some aggressive content caching. We can even have a nice UX on the page, prompting users to 'download' the content or 'make the content available offline' and have that trigger the fetch and cache.

I don't see it being easy to combine the two, however, into a single experience. What do you think?

AymenLoukil commented 4 years ago

75

rviscomi commented 4 years ago

The MVP for this issue can be to build a single page containing all of the chapters without PWA functionality. In theory, saving this page to a PDF would effectively be downloading the entire ebook. Per #520 it should default to static images for the figures.

I'd also love to explore the possibilities with:

mikegeyser commented 4 years ago

I'd love to work on that next, if that's alright? It sounds fun. :)

mikegeyser commented 4 years ago

Just an update: I'm still planning on working on this, will have a bit more breathing room over the next week. Thanks for understanding! :)

tunetheweb commented 4 years ago

No probs. Btw might want to reuse some of https://github.com/HTTPArchive/almanac.httparchive.org/pull/566

rviscomi commented 4 years ago

@mikegeyser have you made any progress on this?

mikegeyser commented 4 years ago

Hey!

I've written and deleted a bunch of code, so I might be spinning my wheels a little bit. Getting all of the content generated in a single file is the easy part, but it's huge (like 1mb raw markup) and doesn't look great. It increasingly feels like book and SPA may be mutually exclusive?

Do you have an example of what you were hoping for? Perhaps something like this? I've been trying to emulate the guidance from this article (by Rachel Andrew) but I'm not sure if I'm off course.

(Also, welcome back! I hope you had a great break.)

rviscomi commented 4 years ago

Hey Mike!

Yeah don't worry about SPA functionality. Everything from the TOC, all 20 chapters, and the methodology should all be rendered to the same (extremely long) page so there's no need for any extra client/server-side rendering of secondary pages. 1 MB sounds like a lot so maybe we can optimize the loading of figures. In any case, do you have a branch I could play with to see how it looks?

The link to Addy's ebook is similar to what I had in mind, but when I look at the print preview it doesn't seem to apply any print-specific formatting. The documentation in Rachel's article is way more complicated than I thought but much closer to the kind of formatting I had in mind, with page numbers and chapter titles in the margins, control of page breaks, etc.

It seems like you're on the right path and I'm excited to see the Almanac in one document.

mikegeyser commented 4 years ago

The biggest problem I have (for page numbers and the TOC) is that the CSS Generated Content for Paged Media Module (css-gcpm) isn't supported in browsers yet. We could look at generating straight to PDF using Prince (as mentioned in Rachel's article) but I haven't committed us to anything yet.

If I'm on the right track, let me tidy some stuff up and I'll push a branch for you to look at.

mikegeyser commented 4 years ago

I've pushed what I have so far to a branch called ebook. It's still very rough, but please feel free to let me know what you think.

mikegeyser commented 4 years ago

Okokok this is cool. :D

I've had some success with weasyprint, and while huge (17mb!) you can see the results here.

What do you think?

tunetheweb commented 4 years ago

This is VERY cool!!! Table of Contents is even hyperlinked to pages!

Presume this is server-side generated, and would be part of npm run generate?

Some nits I've spotted (and know this is just an early version and you've probably got some of these on your to do list, but thought I'd list them anyway cause I'm annoying that way):

But don't take that as criticism of this - it's AWESOME!!

mikegeyser commented 4 years ago

Yes, you're right about it being a part of the generation process. Those points are all valid, and will work through them. :D

mikegeyser commented 4 years ago

We could also generate an appendix for all of the urls, with a page reference?

tunetheweb commented 4 years ago

We could also generate an appendix for all of the urls, with a page reference?

Possibly, would there be a link after each one? So something like this:

This meant that even those without the skills and resources to concentrate on web performance [243] would suddenly have performant websites...

With 243 being the appendix reference to the URL? Might be easier (for you and the reader) to just show the URL after the link to be honest...

And on that note, by happy coincidence just got this into my inbox: https://www.sitepoint.com/css-printer-friendly-pages/ and it suggests the following to add links:

/* print.css */
a::after {
  content: " (" attr(href) ")";
}

Though should limit this to just the chapter text (add an article element selector?) and probably not the author, reviewer, translators so may want exceptions. Also need to consider whether to show for cross reference chapter links.

Also this might also be a bit much and annoying when viewing as PDF. Maybe we need a "PDF" and a Print Friendly PDF" version? I presume there's no such thing as "only display this text in print mode" for PDFs?

rviscomi commented 4 years ago

This looks awesome @mikegeyser! 467 pages!!!

More to iterate on but this is a great start.

mikegeyser commented 4 years ago

I've updated the original issue to consolidate a list of what we feel needs to be done for us to consider this complete.

Would it be alright for me to open up a long-running PR for this, so that people can see the extent of the changes and chime in?

rviscomi commented 4 years ago

Go for it! Just set the PR to "draft".

tunetheweb commented 4 years ago

Added some requirements.

I think we need to decide on the purpose and scope of this. Is the intention to have a viewable PDF? Or a printable PDF? Or Both?

As I alluded to earlier, think those have slightly different requirements (for example whether to display links, whether to change headings from left to right, whether to use real page numbers of start at page 1 for first chapter...etc.) and the requirements are in contradiction to each other. So we might drop some of those requirements completely if only targeting as a PDF to be read on a computer. Or we might want two versions.

To be honest at 467 pages I don't see people printing this themselves. Individual chapters yes - but not the whole thing. If we did ever want to publish this as a "real book" then that might come into play but then would have to deal with the requirements of the publisher then so would raise a separate issue for that if that ever comes to play.

So personally I would limit the scope/intention of this to have an online PDF for now and drop some of those print requirements. Though possibly we should raise a separate issue to re-look at the basic print.css I added in #566 to perhaps add URLs when printing chapters individually?

Of course those people who did decide to print it off completely would still get a pretty professional looking result, just not quite as optimised to include URLs, left/right alternating headings and footers...etc. that we might do if targeting that medium primarily.

Thoughts?

tunetheweb commented 4 years ago

@mikegeyser why did you choose weasyprint over puppeteer and did you look at puppeteer as an option?

We're trying to something similar to this at work and looks like Chrome doesn't support repeating headers on pages so will have a look at weasyprint but wondering if that was the reason for your choice as per your comment in https://github.com/HTTPArchive/almanac.httparchive.org/issues/37#issuecomment-571955659 ?

Nice that there are work benefits to side projects, as well, as immediately thought of this! 😊

mikegeyser commented 4 years ago

That's it @bazzadp. Chrome (and thus puppeteer) doesn't support css-gcpm, so the choices were weasyprint (free) or princexml (super expensive).

rviscomi commented 4 years ago

This is too cool to sit around unused. I'm looking into merging master with the ebook branch to get all the latest updates, but unexpectedly running into a few merge conflicts and other procedural incompatibilities. I'll try to work through those and will update this issue with any progress on that or the feature wishlist. @bazzadp @mikegeyser LMK if you want to help!

tunetheweb commented 4 years ago

This is too cool to sit around unused.

I agree! Was going to get to this eventually but didn't want to step on @mikegeyser 's toes and found other things to amuse myself with.

I'm looking into merging master with the ebook branch to get all the latest updates, but unexpectedly running into a few merge conflicts and other procedural incompatibilities. I'll try to work through those and will update this issue with any progress on that or the feature wishlist. @bazzadp @mikegeyser LMK if you want to help!

I found it easier to go the other way, as @mikegeyser only had a few commits. So forked off a new ebook2 branch from master and merged those into that.

I also moved this to the base templates and internationalised it (mostly - headers and footers still to do) and added the Japanese version, The French and Spanish versions won't generate as not complete, but not looked into why yet - but probably not worth until they are complete anyway.

Still a few things that need to be done to tidy up the PDFs but it's close.

tunetheweb commented 4 years ago

@mikegeyser let us know if you plan on coming back to finish this, otherwise will work on it myself at some point.