HTTPArchive / almanac.httparchive.org

HTTP Archive's annual "State of the Web" report made by the web community
https://almanac.httparchive.org
Apache License 2.0
610 stars 168 forks source link

Consider exporting SVG versions of charts #2564

Open rviscomi opened 2 years ago

rviscomi commented 2 years ago

@RReverser had a great idea to look into the Download As > SVG feature of charts in Google Sheets. SVGs don't have the same rasterization issues as the PNGs exported from Sheets.

We should investigate whether switching from PNGs to SVGs would simplify the workflow at all or if there are any UX benefits like smaller file sizes or improved scalability.

One potential that would require a lot more work but could significantly improve both the workflow and UX would be to add our own interactivity to the SVGs with JS and CSS. That way we wouldn't need static fallbacks from embedded charts and we can use one figure for all use cases (desktop, mobile, print, reduced data, CORS).

I considered i18n as a potential benefit of SVGs but @RReverser pointed out that the text may be transformed into paths and not easily translated.

RReverser commented 2 years ago

Just one data point: for "Number of Wasm responses" graph, SVG version passed via SVGOMG would amount to 5.43 KB gzipped, whereas current PNG version results in 41.88 KB after optimizations - 7.7x size difference, not to mention better scaling.

tunetheweb commented 2 years ago

Well that's one reason I prefer tinypng.com:

image

The Calibre Image optimisation is nice as automated in case people forget, but do normally try to use tinypng.com (and usually do a bulk update at the end), cause there is significant savings to be had.

But yeah SVG is even smaller still!

RReverser commented 2 years ago

FWIW we could also just look at AVIF, which provides size savings very close to SVG even on default settings

image

but SVG is worth it just for scaling if nothing else IMO.

tunetheweb commented 2 years ago

AVIF still not got enough browser support meaning we'd need PNG support anyway so doesn't save us much to be worth it IMHO. Would further increase the size of this repo (which is already nearly a Gig!)

I do like SVG for fidelity, but problem is automating that. We've (just this year!) got a nice process to pull out the PNG from the published sheets URL, but looks like that (example) doesn't give you the menu option to export as SVG, so can't use that 😞

Doing it manually for every sheet, then having to redo it, instead of using the PNG script we have, for every edit, is just a big pain.

And getting something like Puppeteer to do that in the script, when we have multiple charts in each tab, is difficult as how can it know which charts to select and now. That's why the current script does it based on publish URL where there's only one chart.

The other option is to just use (very optimised) PNGs? As shown above they may be "good enough" and I did question the value of the heavy, heavy sheet embeds previously in #826

RReverser commented 2 years ago

And getting something like Puppeteer to do that in the script, when we have multiple charts in each tab, is difficult as how can it know which charts to select and now. T

I've briefly looked at downloaded URLs when clicked through "Download as SVG".

Here's the general spreadsheet URL:

https://docs.google.com/spreadsheets/d/1IMa2SbdQgshb4pGWF1KOh9s4zMtLbRymWZGYjdaatXY/

Here's the SVG export URL:

https://docs.google.com/spreadsheets/d/1IMa2SbdQgshb4pGWF1KOh9s4zMtLbRymWZGYjdaatXY/embed/oimg?id=1IMa2SbdQgshb4pGWF1KOh9s4zMtLbRymWZGYjdaatXY&oid=341129382&disposition=ATTACHMENT&bo=false&filetype=svg&zx=2r8vgzixkzbu

And here's URL we use right now in iframe:

...
  chart_url="https://docs.google.com/spreadsheets/d/e/2PACX-1vT6yhkn3lw148YQQHLoqA71NIsZLSSoBtgFmd_hRyhcmyPl2OpLyuOjUBk64I5DLE_grN8esL8oA3zt/pubchart?oid=341129382&format=interactive",
...

By cross-referencing those with each other, it's clear that in SVG export URL:

All in all, it seems fairly easy to automate generation of such SVG URL out of all the embed URLs we already have.

RReverser commented 2 years ago

Looks like we can even remove disposition=ATTACHMENT param and avoid the download logic - instead, image is served with correct Content-Type so we could hotlink to those SVGs directly if we wanted that (although they're not optimized, so maybe best to keep this limited to development only).

https://docs.google.com/spreadsheets/d/1IMa2SbdQgshb4pGWF1KOh9s4zMtLbRymWZGYjdaatXY/embed/oimg?id=1IMa2SbdQgshb4pGWF1KOh9s4zMtLbRymWZGYjdaatXY&oid=341129382&bo=false&filetype=svg
tunetheweb commented 2 years ago

All in all, it seems fairly easy to automate generation of such SVG URL out of all the embed URLs we already have.

Now you're getting me excited!!!

Cause at the very least that would save us the compression step.

Now if we could just replicate the hover effects (basically the only bit of interactivity the embeds give us) then we could serve JUST the SVG.

Time to dig up that SVG book on my bookshelf I've been meaning to find time to dig through...

RReverser commented 2 years ago

Hehe, good luck :) I'd send a PR myself at least for the initial step of using SVG instead of PNG, but there's so little time and so much to do 😅

RReverser commented 2 years ago

Cause at the very least that would save us the compression step.

Quick note: I think we'd still want compression, but via SVGOMG / SVGO instead of Calibre. The SVG charts exported from Sheets directly are not optimized, unfortunately:

image (5)

max-ostapenko commented 1 year ago

PNG vs SVG export:

Am I missing something? @tunetheweb @RReverser


Otherwise, here is an option how we could migrate to generated SVG charts:

  1. Use standalone Google Charts for rendering.
  2. Extend figure_markup elements to provide chat configs, example:
    {{ figure_markup(
    image="cmp-services.svg",
    caption="Most common Consent Management Platform (CMP) services.",
    description="Bar chart showing the most common CMP services. The CMP service CookieYes was found on 2.0% desktop and 2.1% mobile sites respectively, the Osano service on 1.4% and 1.4%, OneTrust on 1.2% and 0.9%, Cookiebot on 1.0% and 0.8%, Cookie Notice on 0.6% and 0.6%, iubenda on 0.5% and 0.5%, Complianz on 0.5% and 0.5%, Moove GDPR Consent on 0.4% and 0.4%, Quantcast Choice on 0.4% and 0.4%, and finally Borlabs Cookie was present on 0.2% and 0.3% sites.",
    sql_file="number_of_websites_using_each_cmp.sql",
    id="fig-1",
    chart_config={
        chartType: "BarChart",
        dataSourceUrl:
          "https://docs.google.com/spreadsheets/d/1iJqj3g0VEjpmjzvtX6VLeRehE7LDQGcw6lOadxGxkjk/gviz/tq?gid=2107042211&range=G2:I12",
        options: {
          title: "Most common CMP services",
          subtitle: "Web Almanac 2022: Privacy",
          height: 371,
          vAxis: {
            title: "CMP service"
          },
          hAxis: {
            title: "Percentage of pages"
          }
        }
      }
    )
    }}

    It could look something like this: https://glitch.com/edit/#!/mahogany-transparent-dracopelta I've moved most of the chart attributes into configuration template so that analysts focus on data and chart/axis titles, and everything else is optional.

I think preparing charts in markdown instead of Google Sheets may help make a smoother transition between analysis and content.