NomicFoundation / hardhat

Hardhat is a development environment to compile, deploy, test, and debug your Ethereum software.
https://hardhat.org
Other
7.15k stars 1.37k forks source link

Implement CI improvements outlined in #3925 #5301

Closed galargh closed 1 month ago

galargh commented 3 months ago

Resolves #3925

Hi! I've noticed that you identified a couple of issues with your CI setup that you wanted addressed in https://github.com/NomicFoundation/hardhat/issues/3925, and I'd love to help :)

As per the issue description, I included all of the improvements in a single PR - I aimed to contain each change in a separate commit to help you review them, but I'm also happy to split them up into separate PRs if it makes it easier for you.

Breakdown

For all workflows, include the workflow file itself as part of the files that trigger it

https://github.com/NomicFoundation/hardhat/commit/e27c940bd684a09847d955f0942ca0d3264e4b50 I ensured that all the workflows that use paths filters include the workflow file itself in the filter value.

Don't use $default-branch. This variable is meant to be used only in templates. Using it in workflows is a mistake.

https://github.com/NomicFoundation/hardhat/commit/af96f4d27eae426cda8a4853dc9f6c86efa8cf89 I replaced each occurrence of $default-branch with main. I've also tried to search for other similar instances of template syntax usage but didn't find any.

Parallelize the "lint" workflow: linting the code and linting the website should be done in parallel.

https://github.com/NomicFoundation/hardhat/commit/c56528af08e7ab2d5328bc76e43b573d0ccaa901 I parallelized the Lint workflow using the matrix strategy. After the changes, the workflow consists of 2 jobs, Lint and Lint (docs). I noticed that linting root requires building custom eslint plugins while it's not necessary for the docs. I also decided to check dependency versions only when linting root.

When I looked at the workflow, I noticed that linting was duplicated in three other workflows: Check docs site, Comment on lint failure and Pre-release tests.

https://github.com/NomicFoundation/hardhat/commit/ac06514b1f094d25880a86a4358eafad50bc2ec2 I removed linting from the Check docs site workflow. After the changes, it only builds the docs. Let me know if that's OK.

https://github.com/NomicFoundation/hardhat/commit/ac06514b1f094d25880a86a4358eafad50bc2ec2 I also modified Comment on lint failure workflow. As I understand, it was triggered on pull_request_target to enable commenting on a pull request. The same can be achieved with a workflow triggered by other workflow_runs. I modified the workflow to be triggered whenever Lint completes. The updated workflow continues only if the Lint workflow was triggered by a pull request and if it failed during one of the linting steps. This eliminates the duplication of work while maintaining the desired functionality.

Improve running time, if possible. Review how caches are used, consider moving some things to only run on main or in scheduled workflows, etc.

🟡 https://github.com/NomicFoundation/hardhat/commit/f750bb766f11e9926b181dcc13938380a3e832fc I only performed a basic review of the cache setup. The one thing I noticed was that we might be overriding the pnpm cache sometimes - e.g. when we build only a single package, the resulting cache might not have the dependencies other packages require. To counteract that, I extracted the env setup (pnpm install, node install, cache handling) to a helper action. Then, I applied the action wherever the pnpm cache was used. I configured it to work in restore-only mode by default, and I added a new Populate cache workflow, which is responsible for populating the cache. All in all, I wouldn't expect this to speed up the runtimes significantly, but it does create an environment for more intentional cache use.

🟡 https://github.com/NomicFoundation/hardhat/commit/89272363e6a3850123e7eab16348d1e74c2b3144 I did a light review and cleanup of workflow triggers, but there's definitely more room for improvement here.

As for other ideas for runtime improvements, setting up the cache restore to work with partial key matches might be an interesting idea. It should be simpler to implement now that there's only a single place that deals with the pnpm cache.

The usage of self-hosted runners might be interesting, too. Especially as a means to increase the availability of the runners (e.g. package builds might require ~70 runners for changes lower down the stack while the concurrency limit in GitHub on a free plan is 20).

Expose built packages (.tar.gz files that can be installed) as artifacts.

🔴 I didn't address this in this PR. Would you mind clarifying the intention here? Is it to package up the dist dirs after package builds and expose them as artifacts? If so, I'd be happy to add it to the PR or create a follow-up one.

Consider using templates to make the workflows more maintainable.

🟢 https://github.com/NomicFoundation/hardhat/commit/8de529a343ee1fa80acac7a0d75ff02b857ca67a I'd like to propose an alternative to templating. I turned all the package build (except core) workflows into a single one that uses a build matrix. The matrix is dynamically constructed at the beginning of the run to determine which packages need to be built and tested. I retrieve this information from the pnpm lockfile because it seemed simple enough. In this iteration, I decided to only look at the direct package dependencies, which seems to be sufficient as well.

The advantage of this approach is that there is only one workflow file to maintain, the dependencies between packages do not have to be hardcoded (e.gI discovered that hardhat-truffle5 depends on hardhat-web3, but modifying the latter wouldn't trigger the former's build before), and new packages will be pulled in automatically (e.g. I discovered that hardhat-toolbox-viem wasn't being built before). The tradeoff is that the package builds are now delayed by the package discovery job (< 10s). Also, there is only one workflow to inspect in the Actions tab, which means fewer switches when inspecting a workflow run, but also, the workflow run history is collective.


https://github.com/NomicFoundation/hardhat/commit/322f13c23310efa960b3f19739d51ebb5890d8e8 Finally, I updated the GitHub Actions versions to ensure the latest ones are used. This will help clear any node version deprecation warnings you might have been seeing.


That's it for now; let me know what you think :)

Testing
changeset-bot[bot] commented 3 months ago

⚠️ No Changeset found

Latest commit: 684653b8689583531e34cba2801f69994ab79211

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

vercel[bot] commented 3 months ago

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
hardhat ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jul 29, 2024 1:13pm
alcuadrado commented 3 months ago

Hey! Thanks for submitting this pr! I'll take a look at this either today or over the weekend

galargh commented 3 months ago

No rush, take your time :) And if there's anything that needs some more clarification, let me know

alcuadrado commented 3 months ago

This is really interesting. I'm still processing and reading through some parts of it, as this makes me read more docs than a usual PR, but there are some really good things here.

The commit that I'm not sure it works as intended is the one modifying Comment on lint failure. This runs on pull_request_target to improve on a typical situation where an external contributor sends a PR but it doesn't pass the linter, and they don't notice it until we "approve and run".

I'm still reading about the cache strategy, that's pretty cool.

The matrix thing, I actually had done something really similar in our main development branch! The implementation is different, but I think it's the same intention: https://github.com/NomicFoundation/hardhat/blob/v-next/.github/workflows/v-next-ci.yml

alcuadrado commented 3 months ago

I'll continue reviewing this tomorrow

galargh commented 3 months ago

This is really interesting. I'm still processing and reading through some parts of it, as this makes me read more docs than a usual PR, but there are some really good things here.

I can definitely relate, GitHub Docs do have a tendency to pull one in 😅

The commit that I'm not sure it works as intended is the one modifying Comment on lint failure. This runs on pull_request_target to improve on a typical situation where an external contributor sends a PR but it doesn't pass the linter, and they don't notice it until we "approve and run".

I see! Thank you for providing this additional bit of context. It definitely makes much more sense to me now, and I understand the original intent better.

My objective was to secure the repository against security breaches caused by script injection, and in this case the easiest way to achieve that was by avoiding pull_request_target usage. However, that's not the only way.

🆕 https://github.com/NomicFoundation/hardhat/pull/5301/commits/1911acf1a6f27e17241283f7390050d881832d10 I have now restored the pull_request_target trigger in the linting workflow, but I added a couple of changes to help minimise the possible attack vector.

  1. I split the lint and comment jobs so that we can grant minimal permissions (just contents read) to the job that checks out unvetted code.
  2. I ensured the checkouts do not persist any git credentials on disk.
  3. I ensured that sensitive files (such as actions definitions, for example) are sourced from the default branch rather than the PR merge commit.

In this setup, it would still be possible to execute arbitrary code on a runner, but it shouldn't have access to sensitive data. The GitHub token can only read the repo which is public already anyway, and to even obtain it one would have to scrape the memory. As for secrets, GitHub claims that unreferenced ones are wiped from memory altogether.

I tested the new setup on a PR in my fork. The workflow run triggered on pull_request_target successfully posted a comment to the PR, while the one triggered on pull_request skipped that step as expected.

The matrix thing, I actually had done something really similar in our main development branch! The implementation is different, but I think it's the same intention: https://github.com/NomicFoundation/hardhat/blob/v-next/.github/workflows/v-next-ci.yml

That's awesome! Yes, that looks exactly like what I was going for, too. One important thing I tried to preserve on this branch was the "laziness" of the package builds since there are quite a few combinations to consider, but the principle is the same 😄


Thank you so much for spending the time reviewing my proposals, greatly appreciated 🙇

galargh commented 1 month ago

I'm closing this PR as we agreed to:

  1. Split it into smaller chunks to make it easier to review & merge.
  2. Target the v-next branch instead.