Investigate ways we can keep track of perfomance metrics in PRs

romaricpascal commented 1 year ago

What

Look for ways to automatically measure and display performance metrics when submitting PRs.

Why

As we're ahead of a big change in the codebase, it'll be good to identify how this impacts our users. We're hoping modernising our JavaScript and polyfilling strategy will help shave KB of their final bundle, but to be sure we need to measure it.

We're also looking to potentially automate polyfilling and transpiling of the latest ES syntax to match the browsers we support. We'll want to keep tab on these metrics to make sure code being written once we've modernised our approach to JavaScript doesn't inadvertently weigh down our users with a sneaky polyfill or feature that's heavy to transpile.

Assumptions

Not really assumptions, but a handful of links:

Timebox

We should review progress after this period of time has elapsed, even if the spike has not been 'completed'

1 day [to work out which metrics we think we care about]

Who is working on this?

Spike lead: Brett

Spike buddy: Romaric

Questions to answer

[x] Which metrics can we measure and how do each help our users and us?
[x] Which tools can help us keep tab on these metrics and how?

Done when

You may find it helpful to refer to our expected outcomes of spikes.

[x] Questions have been answered or we have a clearer idea of how to get to our goal
[x] Findings have been reviewed and agreed with at least one other person
[x] Findings have been shared, e.g: via a write-up on the ticket, at a show & tell or team meeting

domoscargin commented 1 year ago

Some more tools listed here:

https://github.com/pajaydev/awesome-web-performance-budget#tools-to-measure-performance-budget

domoscargin commented 1 year ago

Danger with some plugins might also be worth checking out:

https://danger.systems/js/

domoscargin commented 1 year ago

As a minimum, we should keep an eye on file/package sizes.

domoscargin commented 1 year ago

We've populated a doc with some of the metrics we might want to test.

I've started a spike of getting [size-limit](https://github.com/ai/size-limit) up and running: https://github.com/alphagov/govuk-frontend/pull/3076

An immediate concern is that its associated Github action is not certified, so can't be used within alphagov.

However, just running an npm task does work.

The useful-sounding --why argument only works with the webpack plugin.

domoscargin commented 1 year ago

Which metrics can we measure and how do each help our users and us?

We've considered 3 levels of analysis:

Analysing files

Things like:

The size of our distribution CSS and JS files
The size and number of files in our package and dist folders
The number of assets, by type
The size of our release zip
The size of our assets, by type
Analysing the code

Things like:
Size of first-party code
Details of all dependencies
Size of dependencies, polyfills
Number of modules
Duplicate modules
Duplicate code
Analysing web performance

Things like:
Time to interactive
Largest Contentful Paint
Cumulative Layout Shift
Lighthouse and other automated web perf scores
Number of first and third party requests
JS long-running jobs
Blocking JS
JS Errors

Which tools can help us keep tab on these metrics and how?

For this spike, we focused on analysing files and analysing code. We do have access to Speedcurve for web performance stats, so could possibly look into that later, though it's a bit unclear what we would be measuring - certainly we can measure the Design System website's performance, but that's not really directly related to the upcoming changes to govuk-frontend's JavaScript. Potentially we could do some measurement on the review app to check some basic stuff.

Potential tools

As a general note, many of the tools we found rely on Webpack to put their stats and displays together. While this is certainly a way that some folks will ingest govuk-frontend, it feels better for us to go closer to the metal and try to get something that looks as much like our own compiled code as possible.

Relative CI

Pros: Allows for easy trend tracking, fancy graphs and not much work on our part Cons: Paid for (there's a free open-source tier which might be viable for us), uses Webpack (this should change in v5, which is planned to have Rollup support)

Size limit

Pros: Small and simple, provides an easy way to keep track of file size (and a ready-made Github action for comments), also allows further analysis with certain plugins. Cons: Uses webpack, Github Action is unverified, so would need approval

Statoscope

Pros: Probably more customisable than size limit (this is what size limit uses under the hood) Cons: Would require us to roll our own Webpack bundling to test (as it doesn’t self-build)

Rollup plugin visualiser

Pros: We use Rollup currently, so theoretically this is fairly accurate to what we compile; offers good visual data and opportunity to drill down. Cons: We’re using an oooold version of Rollup which isn’t compatible, so we’d have to run a standalone version; if we do move to Webpack at any point, we’d have to rejig.

Spikes

size limit

We've looked at adding a basic size limit configuration. Size limit has several options for how it builds and what data it gathers. At it's core, on pull requests it measures size difference between files you specify. Using the webpack plugin means it can also provide more in depth module data using Statoscope, which we could finesse into a useful Github comment.

We considered size limit as a simple stop gap - making sure our package sizes don't climb too massively. But we feel like we'd need to replace it eventually so if we can find another option which can give us an MVP quickly and allow for more detail later, that'd be better.

Rollup-plugin-visualiser

We've also looked at rollup-plugin-visualiser. This feels like a better way to go, since Rollup is what we're using to compile these files. It provides good data which we could finesse into helpful Github comments.

One problem is that it relies on Rollup 3, and we're pinned to 0.59.4 in order to support IE8. We can work around this by running a standalone version of rollup for gathering stats.

What remains to be done

Defining "good"

As a team, we'd need to consider what changes in the metrics are acceptable, and probably a process for dealing with PRs that break performance checks but we still want to merge (for example, a big new component that breaks the file size constraints).

Implementing stats gathering with rollup-plugin-visualiser

The actual work of getting this up and running and spitting out data, and failing builds if they break a certain % relative file size increase.

Github actions

We need a Github action to post comments on PRs. Something like:

compute the stats of the PR branch and cache against commit hash
try to access stats of the reference branch from the cache
if they’re not there, checkout reference branch, compute the stats of the reference branch and cache against commit hash
diff the stats to highlight changes in files getting bundled and file sizes
comment with the stats of the PR branch and any changes from the diff

Trends

What none of these tools offer out of the box vs something like Relative CI is a way to look at trends over time. We don't think this is a particular issue - we're mostly interested in some kind of graph to make sure that, yes, file size or number of dependencies, etc are tracking downward. We think it'd be relatively simple to generate this data and store it somewhere like Google Sheets for the purposes of lightly monitoring these trends. This is also something that we could iteratively add once we've got an MVP working.

domoscargin commented 1 year ago

Closing now in favour of #3188

alphagov / govuk-frontend