alphagov / govuk-frontend

GOV.UK Frontend contains the code you need to start building a user interface for government platforms and services.
https://frontend.design-system.service.gov.uk/
MIT License
1.18k stars 325 forks source link

Improve diffing between previous release and new `dist` #3546

Closed romaricpascal closed 1 year ago

romaricpascal commented 1 year ago

What

Limit the chances of Git considering that the files in the previous release and the ones in the release PR are different files entirely, as it leads to the diff showing the old file being removed and the new one being created.

We're currently using Git's -M/--find-renames option for which we can adjust the similarity index. This may just be pushing the issue to a later point, so it could be worth considering other routes (explicitely naming the files to compare, for example, as our releases as pretty structured).

Why

The last release's diff had to be done locally rather than through the action as Git decided the old and new files were entirely different.

Who needs to work on this

Developers

Who needs to review this

Developers

Done when

romaricpascal commented 1 year ago

Started investigating this today and dumping some thoughts ahead of the weekend. It looks like there are a few things that bulk our diff:

  1. The similarity index that's being too high to catch that the govuk-frontend-x.y.z and govuk-frontend-a.b.c are the same files. This leads to the changes being treated as addition/deletion of the whole files rather than renames, with the whole content output in the diff (as well as less clarity on what changed). Even dropping it to M01 (1%), we wouldn't have caught the changes in the JS between v4.4.1 and v4.5.0.[^1]. Dropping to M001 (0.1%) would have done the trick, but it's just pushing the problem further.
  2. The source maps that have entered the picture. They include the whole source code in their sourceContent fields and the diff treats them as addition/deletions, which adds quite a bit (twice the entirety of our source code 😬 ). That's likely where the biggest bulk comes from.

Seems like our work around for 1 has been to drop the similarity index the previous time we had the issue. I've got a few options in mind to get a direct comparison, at least for the JS and CSS files: a. using a more precise glob (eg. dist/govuk-frontend-?.?.?.min.js that would only match the JS file) and a very low similarity index (as we're sure the files are the same) for the JS and CSS comparison, then running 3 sets of diffs: one for the JS, one for the CSS and one for the rest of dist (if there's a way to glob all but the JS and CSS files). It may pay off to comment them separately to limit the risks of them being too big 😊 b. another renaming the file at the time of comparison to their name in the previous release, which would clarify to Git that they are indeed the same. That'd require grabbing the version of the previous release though...

For 2., I'm wondering if it's useful at all to diff the sourcemaps. If we think we do, we'll likely want to prettify the sourcemaps first so they're not on a single line and get a better comparison.

~A final maybe silly idea: could we push a commit for comparison so that it's Github showing the diff rather than us commenting. Implementing b, we could push a commit to a release-vX.Y.Z branch that'd let us visualise the changes in a nice way using Github's comparison features.~ The files being minified will make it troublesome for Github to show a workable diff, I believe.

Maybe more thoughts will come up as I pick things up after the weekend 😊

[^1]: Using git diff -M01 v4.4.1..v4.5.0 -- "dist/govuk-frontend-?.?.?.min.js" > dist-js-4.5.0.diff

romaricpascal commented 1 year ago

As a rough plan based on my ramblings from Friday how about:

  1. Removing the sourcemaps from the diff, which will greatly reduce the size of the diff given they contain the entirety of our sources and are single line.
  2. [Possibly optionally] Splitting the CSS, JS, and the rest of the files in 3 separate diffs, commented individually to increase the chances of being under the limit and avoiding a large change in one of the 3 areas from affecting the others.

Note that #3229 will ensure we get access to diffs too large to be posted as comments by uploading them as build artifacts.

romaricpascal commented 1 year ago

A suggestion by @36degrees for an alternative for not dealing with similarity index would be to compare the files using their explicit names, relying on the VERSION.txt file to pick the version number:

git diff $GITHUB_BASE_REF:dist/govuk-frontend-$(git show $GITHUB_BASE_REF:dist/VERSION.txt).min.css dist/govuk-frontend-$(cat dist/VERSION.txt).min.css