Ideas for trimming the LHR

brendankenny commented 5 years ago

The LHR is getting big, often 1MB or more. Some unnecessary things we could trim:

[ ] a11y per-node explanation are on every details item, often repeated verbatim on multiple nodes, and we don't use them in the report
[ ] tap-targets includes all possible bad targets. We should limit the number.
[ ] full page screenshot could be recaptured with lower quality (or reduced height) https://github.com/GoogleChrome/lighthouse/pull/11689

Feel free to add to the list.

connorjclark commented 5 years ago

Timings are ~9KB

We could fix the numbers so we don't have 100000 digits pass the decimal :) "measure" -> "m" (actually, is that property even necessary?)

connorjclark commented 5 years ago

Could we embed the i18n data in the report renderer, or would that get complicated real quick?

patrickhulce commented 5 years ago

a11y per-node explanation are on every details item, often repeated verbatim on multiple nodes, and we don't use them in the report

This was an explicit bug report when we removed it the first time around and had to add it back because for some nodes the information is specific to that node (color contrast for example). https://github.com/GoogleChrome/lighthouse/issues/5402, we should be careful about removing it.

Could we embed the i18n data in the report renderer, or would that get complicated real quick?

There's enough blubber elsewhere in the LHR to remove that I'd want to do this last. No need to hamper the beautiful simplicity and flexibility of i18n yet IMO :D

Also worth clarifying here what we're worried about. Is it API transfer sizes? Is it storage size on disk? In a database? JSON parse times? Biggest gzipped wins will probably be different from biggest uncompressed wins, and some strategies might reduce LHR size but not really help certain use cases.

Example: I can think of many ways in which we could greatly shrink total API bytes by splitting the LHR into its dynamic and static components just for transport.

brendankenny commented 5 years ago

This was an explicit bug report when we removed it the first time around and had to add it back because for some nodes the information is specific to that node (color contrast for example).

arg, I forgot about that. I wonder if there's some deduping we could do then...some of the strings are quite long and occur multiple times.

Also worth clarifying here what we're worried about. Is it API transfer sizes? Is it storage size on disk? In a database? JSON parse times?

I think all the above. gzip is certainly worth keeping in mind, but we also have a certain responsibility to people saving these somewhere :)

Speaking of which, if you do --output json, we JSON.stringify(lhr, null, 2) by default, so a lot of that size is whitespace. We might consider not doing that (people can always beautify it themselves if a human needs to see it) or doing some middle-ground JSON pretty print like we do with saved traces (one line per trace event instead of one line per trace event property + braces)

paulirish commented 5 years ago

Little hack for a sunburst viz of an LHR's size....

Go to https://vasturiano.github.io/sunburst-chart/example/large-data/
Copy an LHR into your clipboard.
lhr = <paste>
run this in console or snippets:

isPlainObject = function (obj) {
    return Object.prototype.toString.call(obj) === '[object Object]';
};

function calcObjSize(obj) {
 // recurse if array or object.
 if (Array.isArray(obj) || isPlainObject(obj)) {
    return Object.entries(obj).map(([key, value]) => {
     const node = {name: key};
     const nodeValue = calcObjSize(value, {});
     node[typeof nodeValue === 'number' ? 'value' : 'children'] = nodeValue;
     return node;
   });
 } else {
   return JSON.stringify(obj, null, 2).length;
 }
}

data = {
  children: calcObjSize(lhr),
  name: 'lhr',
};

document.querySelector('#chart').innerHTML = '';

    Sunburst()
      .data(data)
      .color(d => color(d.name))
      .minSliceAngle(.4)
      .showLabels(false)
      .tooltipContent((d, node) => `Size: <i>${node.value}</i>`)
    (document.getElementById('chart'));

example:

patrickhulce commented 5 years ago

~~Nice!! Where does partSizes come from though? I'm getting~~

VM50:20 Uncaught ReferenceError: partSizes is not defined
    at <anonymous>:20:13

Oh it was renamed calcObjSize 👍

patrickhulce commented 5 years ago

I'm seeing images + the diagnostic hidden audits taking up ~75+% of the size. Maybe we should focus on a diagnostic audit solution and image deduping?

brendankenny commented 5 years ago

diagnostic hidden audits

from a cnn one I was looking at the network-requests.js was huge, but almost entirely from their gigantic URLs (and having 200 of them).

I was thinking we could stop including query strings for that audit...or enough of the query string that each one is still unique. In many cases they're ad URLs, so they likely aren't available for/worth tracking down after the fact anyways.

paulirish commented 5 years ago

what's so bad about big LHRs anyway?

patrickhulce commented 5 years ago

Is it API transfer sizes? Is it storage size on disk? In a database? JSON parse times? I think all the above

To Paul's point, I'm not sure I buy the argument that all of those things are important :)

brendankenny commented 5 years ago

what's so bad about big LHRs anyway?

Let's close in favor of more specific, future issues, which I'll bet you will start happening quickly as lightbrary, lighthouse-ci, and/or more web.dev history spin up and somebody has to start looking at disk quota :P

GoogleChrome / lighthouse

Ideas for trimming the LHR #7160