gitgitgadget / gitgitgadget

Sending GitHub PRs to the Git mailing list
https://gitgitgadget.github.io/
ISC License
191 stars 71 forks source link

Idea: switch from a GitHub App to a GitHub workflow? #861

Open dscho opened 2 years ago

dscho commented 2 years ago

Right now, GitGitGadget is registered as a GitHub App, its callback URL points to an Azure Function that performs rudimentary checks ("Was this sent by GitHub?", "Is this an allowed /<command>?" and "Was this sent from an allowed repository?"). If all those checks pass, this Azure Function then triggers one of the three repository-specific Azure Pipelines. These Azure Pipelines are registered to run as PR Checks on the GitHub Pull Requests and show up as Check Runs.

This is a bit of a complicated setup and does not generalize well to other projects, and it comes with a lot of configuration in a lot of different places.

What we could do to improve this situation is to change GitGitGadget's architecture so that no GitHub App is required at all, but instead a GitHub workflow has to be added to the repository's main branch.

This GitHub workflow could either use or imitate peter-evans/slash-commands to react to commands in PR comments and trigger the runs. It could even consist of a single step that calls GitGitGadget either as a full-blown Javascript GitHub Action or as a composite Action (a cross-repository version of reusable workflows). We could even convert GitGitGadget itself into a GitHub Action, which would have the benefit of moving the responsibility for transpiling the Typescript code into Javascript code from the GitGitGadget users to GitGitGadget itself.

The biggest obstacle here is: how do we get that GitHub workflow into the repository? In the Git project, based on the success of GitGitGadget, we could likely convince the Git maintainer to accept it. In other projects, it might be a harder sell.

Another downside: the GitHub workflow would also be active in each and every fork... We could work around that by adding a check similar to what Git's git-l10n workflow does (probably adding a condition that GitGitGadget's secrets need to be configured):

jobs:
  git-po-helper:
    if: >-
      endsWith(github.repository, '/git-po') ||
      contains(github.head_ref, 'l10n') ||
      contains(github.ref, 'l10n')

This means that the workflow would be run, but its only job would be skipped when run in forks.

One downside of this approach would be that we no longer get the PR Check Runs for free, but would have to create them manually. Not a big deal, but... more work.

We might still want to make sure that the workflow runs on a dedicated, self-hosted runner so that it does not have to re-clone and re-compile GitGitGadget, the git/git clone and the lore clone (although, having said that, if we turn GitGitGadget into a GitHub Action there would be no need to compile GitGitGadget in the workflow, and since lore uses a public-inbox' v2 format which allows for incremental fetching without getting All The Blobs, we could switch to a concept where we do not need any self-hosted runner anymore).

The upside would be that the architecture would be a lot simpler, and we could work consistently within the GitHub workflows framework (as opposed to GitHub App -> Azure Function -> Azure Pipeline -> GitHub PR Check Run). The configuration would then also be in one of two places: in the GitHub workflow (for customization of GitGitGadget e.g. to use it for a Linux subsystem project) and in the GitHub repository secrets (for the credentials used to send mails, and the credentials for updating the notes if they are not in the same repository).

@webstech what do you think?

dscho commented 2 years ago

since lore uses a public-inbox' v2 format which allows for incremental fetching without getting All The Blobs, we could switch to a concept where we do not need any self-hosted runner anymore

I am actually quite intrigued by this idea.

So I've set up a mirror at https://github.com/gitgitgadget/git-mailing-list-mirror. This mirror uses a trick to avoid requiring an Azure Pipeline to keep itself updated: none of the mirrored branches are actually the main branch. The main branch contains an independent commit history that merely adds a scheduled GitHub workflow that keeps the mirror up to date.

The mirror is required because we would need to use a partial clone to work with commit ranges, so that Git would fetch commit, tree and blob objects as needed (as opposed to cloning ~2.5GB every time the workflow runs), and lore.kernel.org does not support partial clones, while GitHub does:

git clone --bare -b master --depth=1 --filter=blob:none https://lore.kernel.org/git/1 x
Cloning into bare repository 'x'...
warning: filtering not recognized by server, ignoring
[...]

For the record, here is the output of a local git sizer run:

$ git sizer -v                                                                                                [435/1032]
Processing blobs: 446662
Processing trees: 447290
Processing commits: 447287
Matching commits to trees: 447287
Processing annotated tags: 0
Processing references: 10
| Name                         | Value     | Level of concern               |
| ---------------------------- | --------- | ------------------------------ |
| Overall repository size      |           |                                |
| * Commits                    |           |                                |
|   * Count                    |   447 k   |                                |
|   * Total size               |   114 MiB |                                |
| * Trees                      |           |                                |
|   * Count                    |   447 k   |                                |
|   * Total size               |  12.4 MiB |                                |
|   * Total tree entries       |   447 k   |                                |
| * Blobs                      |           |                                |
|   * Count                    |   447 k   |                                |
|   * Total size               |  2.33 GiB |                                |
| * Annotated tags             |           |                                |
|   * Count                    |     0     |                                |
| * References                 |           |                                |
|   * Count                    |    10     |                                |
|                              |           |                                |
| Biggest objects              |           |                                |
| * Commits                    |           |                                |
|   * Maximum size         [1] |   680 B   |                                |
|   * Maximum parents      [2] |     1     |                                |
| * Trees                      |           |                                |
|   * Maximum entries      [3] |     2     |                                |
| * Blobs                      |           |                                |
|   * Maximum size         [4] |   767 KiB |                                |
|                              |           |                                |
| History structure            |           |                                |
| * Maximum history depth      |   446 k   |                                |
| * Maximum tag depth          |     0     |                                |
|                              |           |                                |
| Biggest checkouts            |           |                                |
| * Number of directories  [3] |     3     |                                |
| * Maximum path depth     [3] |     3     |                                |
| * Maximum path length    [3] |    26 B   |                                |
| * Number of files        [3] |     2     |                                |
| * Total size of files    [5] |   767 KiB |                                |
| * Number of symlinks         |     0     |                                |
| * Number of submodules       |     0     |                                |

[1]  9f096cd89b159dae09be19509f13df89bd0d92f7
[2]  2e316c4aae8db038fb62701f8be09e2ed2210bcb (refs/heads/sync)
[3]  c30fd157bbac507c8817dc486f6a5d26e5e486c8 (refs/heads/sync^{tree})
[4]  ba922d143711db58b4f016b18c1e65a23812f6b5 (820ae7c04c4ca742a85a08f2a958d0d1caa51f8e:d)
[5]  c5b44ef9c71677b355d6525c3a4ed3f8c3bdb14a (820ae7c04c4ca742a85a08f2a958d0d1caa51f8e^{tree})
dscho commented 2 years ago

We could work around that by adding a check similar to what Git's git-l10n workflow does (probably adding a condition that GitGitGadget's secrets need to be configured)

Turns out that it is not possible at the moment to skip jobs if certain secrets are unavailable.

However, we can classify a repository with certain topics. And the list of these topics is then available in the job via the github.event.repository.topics attribute (and as far as I can tell, these topics are not inherited by forking). I went ahead and verified that something like this works:

jobs:
  job:
    runs-on: ubuntu-latest
    environment: with-some-secrets
    if: contains(github.event.repository.topics, 'handle-slash-commands') && startsWith(github.event.comment.body, '/')
    steps:
       [...]
webstech commented 2 years ago

@webstech what do you think?

When GitHub actions showed up, this looked like a good direction to go.

Is this the best place for this conversation or should we start using Discussions to break it up into pieces (steps, security, performance)?

This fits in with generalizations we have been working on already.

Random thoughts: Would fetches be moved to the workflow? This would simplify for different repos.

Using peter-evans/slash-commands would be handy but still need to handle parallel /allow happening at the same time.

Just some of many thoughts.

dscho commented 2 years ago

Is this the best place for this conversation or should we start using Discussions to break it up into pieces (steps, security, performance)?

Why not simply continuing here, maybe accumulating a list of sub-tickets in the description?

This fits in with generalizations we have been working on already.

Yes!

Would fetches be moved to the workflow?

You're referring to the automatic synchronization of the branches of https://github.com/git/git (and https://github.com/gitster/git) to https://github.com/gitgitgadget/git, right?

Since this cannot be done automatically (only manually), moving it to a workflow would require it to be done in a scheduled workflow (that polls the source repositories).

And since scheduled workflows have to live in the main branch, that would mean that we either try to talk the Git project into hosting GitGitGadget's scheduled workflow in their main branch (which is unlikely to happen) or we host it in a separate repository (which will then have to use either a Personal Access Token or the GitHub App credentials to allow writes to https://github.com/gitgitgadget/git).

I tend to favor the latter.

We could create a new repository, say, https://github.com/gitgitgadget/gitgitgadget-workflows, for that purpose.

The real good news is that this synchronization is pretty independent of the rest of GitGitGadget, and can be moved to a GitHub workflow independently.

dscho commented 2 years ago

still need to handle parallel /allow happening at the same time.

At first, I thought we could handle that via concurrency. But it seems that would cancel pending jobs, which is no good at all.

So a better strategy might be to detect when a push fails because it does not fast-forward, and in that case simply wait for a randomized number of seconds, then force-fetch the notes and try again. Lather, rinse & repeat until either the push succeeds, or until a maximum number of trials has failed.

It gets slightly more hairy with /submit. The reason is that we do quite a few more things when submitting: we add notes that associate the Message-IDs of the sent messages with the original commits, we add a note that associates the current Pull Request with the cover-letter's Message-ID, the current revision and the pushed tag, and we add the Pull Request to the active Pull Requests.

My current thinking is that most of these notes won't conflict. For example, the note that associates a given patch mail's Message-ID with the corresponding commit is immutable. Concurrent GitGitGadget runs might want to push divergent refs/notes/gitgitgadget, but the actual trees should merge trivially, for the most part.

The one exception, of course, is the annotation of the empty blob, which is the place where the IGitGitGadgetOptions are stored.

But the actual options should be relatively easy to merge, too! They contain

So it should be relatively easy to merge divergent versions of refs/notes/gitgitgadget, simply by using a couple of low-level Git commands (including git read-tree -m) using a temporary index.

webstech commented 2 years ago

The one exception, of course, is the annotation of the empty blob, which is the place where the IGitGitGadgetOptions are stored.

allow/deny can easily be changed to catch the push failure; it is not so clear for the openPRs and activeMessageIDs. However, exactly how are they used? openPRs does get used as a list of PRs that have been submitted. Could a submitted label on the PRs provide the same info? Not sure how activeMessageIDs is used.

If misc-helper is run with update-open-prs, will it auto-correct any missing entries or will there still be a problem in the repo (showing my low git skills)?

webstech commented 2 years ago

Fetching changes (maybe not the best header for this)

Pulling down the whole git repo is expensive (time and bytes).

For patches, enough of the repo is needed to create them. The check-whitespace workflow tried to limit the shallow fetch but failed. Would it have worked if the ref was the pull head branch instead of the merge? Just wondering (checkout is the slow part of that workflow). Not specifying depth=0 may be faster.

Anyway, if checkout with the pull head branch and depth is not feasible here either, there may be another way to do this. A graphql query could be done for the base branch sha to get the committedDate. This can be used to fetch --shallow-since= and not rely on depth of commits. I have done some playing with this that looks promising.

webstech commented 2 years ago

Responsiveness

There were concerns about how fast workflows run vs pipelines. Would the separation of slash command handling (one workflow) and command processing (another workflow) create more delays?

webstech commented 2 years ago

Compatibility

This could be set up to pretty much run the way it does now for slash commands (although command processing runs on a separate repo workflow). An alternative is to provide specialized handlers, bypassing some of the ci-helper code for handlers that are specific to the slash command.

How do you see a workflow implementation?

dscho commented 2 years ago

Sorry for the following wall of text, there is a lot of ground to cover.

allow/deny can easily be changed to catch the push failure; it is not so clear for the openPRs and activeMessageIDs.

Right, for the latter, we would have to start over. Which in the case of /submit would mean that we would need to remember which message IDs to add.

It might actually be easier to implement the merge strategy for the GitGitGadget options, and not try to start over, in both allow/deny and submit scenarios.

However, exactly how are they used? openPRs does get used as a list of PRs that have been submitted. Could a submitted label on the PRs provide the same info?

The openPRs field is used:

So in short: openPRs is used to figure out status updates (triggered by updated integration branches, or What's Cooking mails).

Not sure how activeMessageIDs is used.

I think my original idea was to have an easy mapping from mails to the commits they are responding to, but that information is already in IMailMetadata, i.e. in the data we store via the note on the blob whose contents are the Message-ID of the respective mail.

So that field might be obsolete...

If misc-helper is run with update-open-prs, will it auto-correct any missing entries or will there still be a problem in the repo (showing my low git skills)?

The idea is to self-heal here.

Fetching changes (maybe not the best header for this)

Pulling down the whole git repo is expensive (time and bytes).

For patches, enough of the repo is needed to create them. The check-whitespace workflow tried to limit the shallow fetch but failed. Would it have worked if the ref was the pull head branch instead of the merge? Just wondering (checkout is the slow part of that workflow). Not specifying depth=0 may be faster.

Anyway, if checkout with the pull head branch and depth is not feasible here either, there may be another way to do this. A graphql query could be done for the base branch sha to get the committedDate. This can be used to fetch --shallow-since= and not rely on depth of commits. I have done some playing with this that looks promising.

I think we can do a whole lot better these days, simply by using a "fully partial" clone. That is, if we initialize a bare repository, then add a remote without fetching anything, then mark it as a "promisor" remote, Git will lazily download the objects when it needs them (and only the objects that are needed):

git init --bare
git remote add origin https://github.com/gitgitgadget/git
git config remote.origin.promisor true
git config remote.origin.partialCloneFilter blob:none

We already know the commit range, and could let Git fetch the commits sequentially. If that turns out to be too slow, we could easily call octokit.rest.repos.compareCommitsWithBasehead() to figure out the commit and top-level tree OIDs, then fetch them in one go by a dummy git show <oid>... call.

Responsiveness

There were concerns about how fast workflows run vs pipelines. Would the separation of slash command handling (one workflow) and command processing (another workflow) create more delays?

I don't think that is a concern, really. Right now, the Azure Function triggers an Azure Pipeline. While the Function is super quick to respond, the Pipeline has exactly the same spin-up time as a GitHub workflow (the underlying build agents' technology is exactly the same).

So if we let the workflow run directly, without requiring an Azure Function to trigger it, but let the PR comment trigger it, this should be a win, actually.

The only thing I want to be really careful about is that we need to ensure that the workflow secrets are safe, i.e. that no untrusted user can go ahead, open a PR that modifies the workflow, then trigger a run that extracts the secrets somewhere they don't belong. But since the PR comment trigger only runs workflows on the main branch, I believe there are enough hurdles there.

Besides, we can easily add the secrets to "environments", ask for the respective workflow job to be run in said environment, and then restrict the respective environment to runs of workflows on the main branch.

Compatibility

This could be set up to pretty much run the way it does now for slash commands (although command processing runs on a separate repo workflow). An alternative is to provide specialized handlers, bypassing some of the ci-helper code for handlers that are specific to the slash command.

I would actually like to start with the other Pipelines, the ones that are not triggered by slash commands, but by updates to the core Git branches, or to the Git mailing list mirror. We can start to migrate them to GitHub workflows already.

How do you see a workflow implementation?

The Pipelines that are triggered by repository updates e.g. gitster/git or lore.kernel.org/git/ will have to be turned into scheduled workflows, i.e. to poll. While Azure Pipelines abstract away that polling, GitHub workflows can only trigger on updates of branches in the same repository as the workflow.

To decouple GitGitGadget from the Git project a bit, i.e. to avoid having to ask the Git maintainer to accept such scheduled workflows into their code base, I think we can easily let these workflows live in https://github.com/gitgitgadget/gitgitgadget itself.

For the workflow that needs to react to slash commands, we will have to experiment a big, probably by turning GitGitGadget into a full-fledged GitHub Action that is then run from a workflow that is added to the main branch. Only when it works as intended would I want to ask the Git maintainer to integrate it.

Since we do not want to experiment directly on gitgitgadget/git's main branch, we could start our experiments in our personal forks, and then add a ggg-slash-commands-experiments branch that is based off the main branch but adds a .github/workflows/gitgitgadget.yml that uses the GitHub Action. We would then -- temporarily! -- use that branch as gitgitgadget/git's default branch.

Such a GitHub Action is relatively easy to write, I even wrote two for Git for Windows: https://github.com/git-for-windows/get-azure-pipelines-artifact/ and https://github.com/git-for-windows/setup-git-for-windows-sdk/. Both are very similar and essentially set up a process where the Typescript code is required to be pre-compiled into Javascript, packaged into a single file including the dependencies, and committed. The CI builds then verify that nobody tampered with the compiled output by verifying that the same output is produced in a fresh npm run build && npm run package run.

Since a GitHub Action is identified and defined by the action.yml file, we can easily have a different entry point for it than the main script defined in package.json (which we do not yet define in GitGitGadget, and which we could point to the compiled misc-helper.js script).

That way, a cloned https://github.com/gitgitgadget/gitgitgadget/ could double both as a GitHub Action as well as the misc-helper script.

For workflows such as the one replacing the Mirror Git List to GitGitGadget's PRs Pipeline, we would use GitGitGadget in the latter fashion, probably inside a scheduled workflow that polls the Git mailing list mirror repository.

softworkz commented 2 years ago

Wow, that's a lot of stuff and ideas!

Just a few thoughts from my side:

softworkz commented 2 years ago
  • For those reasons, I would favor sticking to have a "GitHub App" and use the checks API to display the "Running Checks" indication as early as possible

This is still on my list (https://github.com/gitgitgadget/gitgitgadget/issues/839#issuecomment-1038289310), I was just waiting to collect multiple work items before working on this again. The config generalization will be a good reason to revisit.

dscho commented 2 years ago
  • If the "Azure Function" would not trigger an Azure pipeline but run the JS code directly, the startup delay could be completely eliminated.

We had something like that in the beginning, it was an Azure App, and its operation was completely opaque to the user. There was no feedback, no log to follow or to figure out what went wrong. And the App was sitting there and waiting for a long time, ramping up cost for no benefit.

I am rather certain that I do not want to go back to that.

So how about an Azure Function instead of an App? Functions do not have state, and they also have rather short timeouts. A no-go.

What I would like is a hybrid approach, where you can have the slash commands handled via a workflow that triggers on issue_comment events, but you also can let an Azure Function pick them up and delegate the actual operation to a workflow.

The Azure Function could create the Check Run, and hand the workflow_run_id to the workflow so it can update and eventually complete the Run. That's all a bit finicky, though, as ownership of that Run has to be passed around, and we will have to handle the case where a workflow fails or is canceled, and therefore has to update the Check Run accordingly.

softworkz commented 2 years ago

We had something like that in the beginning, it was an Azure App, and its operation was completely opaque to the user. There was no feedback, no log to follow or to figure out what went wrong. And the App was sitting there and waiting for a long time, ramping up cost for no benefit.

I am rather certain that I do not want to go back to that.

I agree and I'm not suggesting that either.

The idea is this:

I guess we are all clear that this whole thing cannot smoothly work without having a dedicated runner (no matter whether DevOps workflow or GH action). That means that we always need to have that one machine to set up and maintain. Having that as a prerequisite makes it easy to eliminate Azure Function/App altogether and run the endpoint for the GitHub App directly on our dedicated runner (I'm using PM2 to run the nodejs endpoint and publish via Nginx). This has a number of advantages:

And of course it will simplify the setup procedure, because there won't be an Azure function needed to be setup at all.

dscho commented 2 years ago

I disagree that we need our own runner.

softworkz commented 2 years ago

Why? What's bad about it, and why do you think that having to set up and register an Azure function would be better?

PS: I'm just curious. I don't mind having a solution that leaves multiple choices for setup.

dscho commented 2 years ago

Maintaining a build agent is not free. Not only the monetary and environmental cost, maintenance is non-negligible, too.

It is also a single point of failure.

Therefore we should try to get away from it, too. I mentioned elsewhere how we can do that (I'm on the phone right now, otherwise I'd look for a link).

softworkz commented 2 years ago

Maintaining a build agent is not free

I use a Google cloud free-tier VM and the cost in January was 0.28 EUR (no free credits involved, I had used them up long ago)

environmental cost

It's a VM server, resources are allocated dynamically.

maintenance is non-negligible, too

That's generally true, but I haven't touched the server again since I had completed the setup. It will be rather an occasional task, but yes - it's greater than zero.

It is also a single point of failure.

Having the Azure function on Azure plus a dedicated build server makes two single points of failure. When you add lore.git.org, it makes three single points of failure (when one goes down, the whole setup is broken). When having all three things on a single server, you will still have one single point of failure but at least not three.

Also you need to consider that it's rather unlikely that another "client" would have the respective mailing list already being archived on some Public Inbox server. Like in my case, this will usually need to be set up as well on a server (and you can't use GH actions or DevOps pipelines for this). In turn - you'll have the server anyway.

I mentioned elsewhere how we can do that

If you mean the message about shallow fetching and the "trick" to deal with it (roughly), then I've read it. Assuming it would work (and I have no doubt that it will after having seen all the weird stuff you're doing with Git ;-), I would still be wondering whether this wouldn't end up in even longer turnaround times when processing PR comments instead of lower ones?

PS: "weird" in the sense of "cool I didn't know about"

webstech commented 2 years ago

Status update on POC for workflow.

Setup

  1. ggg user repo has customizations and the workflows for processing requests
  2. ggg monitored is the repo being monitored and has workflows to trigger the workflows in the user repo
  3. there is a mail repo for POC purposes

    Tests

  4. Handle PR create/synchronize - a trigger script in the monitored repo runs a repo workflow in the user repo
  5. Handle PR comment - a slash commands script in the monitored repo runs a repo workflow in the user repo
  6. Check emails - the repo workflow for /submit commands also generates reply emails that are subsequently processed

    Status

  7. There is a pending PR to gitgitgadget for issues found while testing
  8. The tests appear to be working
  9. Adding checks to indicate failures is pending
  10. Generalization of the workflows is pending (remove hardcoded values in env vars)
  11. Creating a shared workflow (for init steps) is pending

Looking for input on this. Is this approach acceptable? This could be used by any project using gitgitgadget. I know everyone is busy but any feedback is appreciated. I am happy to have reached this point with a repeatable setup.

Other optimizations would include using a cached build of gitgitgadget (updated when ggg is updated).

Thanks.

dscho commented 2 years ago

@webstech woooooooow! This is super awesome!

I think this is the absolute right way to go.

For some projects, it might not be feasible to add the comment.yml and pullrequest.tml files to .github/workflows/ of the target (=monitored) repository (because it might be a mirror of an upstream project that is unlikely to want those files there, I can already imagine the expletives uttered by Mr Torvalds if he ever saw this in the Linux kernel repository, for example). For those, I would love to retain the ability to install a GitHub App backed by an Azure Function.

That's not a blocker, of course, as that can easily be added on top of your wonderful work.

Another wish I have: I would love for the same ggguser-type of repository to be able to handle multiple gggmonitored-type of repositories, simply by moving more configuration into the payload. As an example, I would love for the same fork of ggguser in dscho/ to be able to handle my contributions to the BusyBox project as well as to the Cygwin project (in those instances, I might not even need the mailing list mirroring capabilities).

In any case, I am really, really happy with your progress.

The most immediate question is: how do move the current GitGitGadget onto those workflows? I guess we need a gitgitgadget-workflows repository (probably as a fork of ggguser?), moving away from the Azure Pipelines, by adjusting the Azure Function to point to the workflows, right?

I would love for one thing to be addressed before that, though: currently, the PR checks do not actually show the run that handled the comment, but instead they show the run that triggered the run that handled the comment... Maybe we can add explicit code to create and update a check run?

webstech commented 2 years ago

I would love to retain the ability to install a GitHub App backed by an Azure Function.

It may be possible for a GitHub app to trigger the repo workflows without needing Azure. Hmm, that could actually provide more flexibility. Need to ponder. I put one together awhile ago as a probot app.

Another wish I have: I would love for the same ggguser-type of repository to be able to handle multiple gggmonitored-type of repositories, simply by moving more configuration into the payload.

Working on it. Not everyone needs a generated config. In fact most projects do not. They can supply a config with all the needed values. The two actions being used now do not provide a consistent way to pass extra inputs (maybe an app could do that). Still looking at it. The current comment.yml in ggguser handles both commands and pushes, extracting more data from the trigger payload. IOW only one workflow may be needed.

In any case, I am really, really happy with your progress.

Me too! (or I would have waited to get more done)

The most immediate question is: how do move the current GitGitGadget onto those workflows? I guess we need a gitgitgadget-workflows repository (probably as a fork of ggguser?), moving away from the Azure Pipelines, by adjusting the Azure Function to point to the workflows, right?

Two things. First, I thought it would be best to make this POC a reference set of repos under gitgitgadget so other projects could see it as a model. genConfig.ts could be added to gitgitgadget itself. Second, a workflow could be added to gitgitgadget/git for handling slash commands while pushes are still handled via azure. ie it could be a phased implementation. I still have concerns about performance. How long does a git checkout take? How long does an email repo checkout take? I did manage to cut some seconds off of ggg install by turning on caching.

I would love for one thing to be addressed before that, though: currently, the PR checks do not actually show the run that handled the comment, but instead they show the run that triggered the run that handled the comment... Maybe we can add explicit code to create and update a check run?

Agreed. May have mentioned it as outstanding. There appears to be a checks action that may be useful. In the initial workflow it would create a check. The id would have to be passed to the repo workflow. The repo workflow would update the check with the run status. Is that what you were thinking?

This has been an interesting journey. gggmonitored has scripts to recreate the repo in github but there were little tidbits to clean up along the way.

dscho commented 2 years ago

I would love to retain the ability to install a GitHub App backed by an Azure Function.

It may be possible for a GitHub app to trigger the repo workflows without needing Azure. Hmm, that could actually provide more flexibility. Need to ponder. I put one together awhile ago as a probot app.

I actually started out with that: b1fd8d5480e45525749ca99ac8bdbbdaff291933. But then I switched to an Azure Function: c11e85d0e84dcf5bfa67952e03473795c9a757bc

Another wish I have: I would love for the same ggguser-type of repository to be able to handle multiple gggmonitored-type of repositories, simply by moving more configuration into the payload.

Working on it.

Excellent!

The most immediate question is: how do move the current GitGitGadget onto those workflows? I guess we need a gitgitgadget-workflows repository (probably as a fork of ggguser?), moving away from the Azure Pipelines, by adjusting the Azure Function to point to the workflows, right?

Two things. First, I thought it would be best to make this POC a reference set of repos under gitgitgadget so other projects could see it as a model. genConfig.ts could be added to gitgitgadget itself.

Makes sense.

Second, a workflow could be added to gitgitgadget/git for handling slash commands while pushes are still handled via azure. ie it could be a phased implementation. I still have concerns about performance.

Understandable. I am slightly worried, too, that's why I suggested this insane idea of "fully partial" clones.

How long does a git checkout take?

That's easy: four seconds. That's a shallow clone, though. We would most likely use a depth of 60 to allow even for exceptions from our "max 30 commits" rule.

As I mentioned above, though, we could go for "fully partial" clones if speed would be an issue.

How long does an email repo checkout take?

The checkout itself is fast because there's only the m file, but a full fetch would take way too long.

Again, here I would use a shallow clone, say, 150 deep, just to be on the safer side. And with code to unshallow if the previously-seen tip isn't visible.

I did manage to cut some seconds off of ggg install by turning on caching.

I'd want to go one step further: turn GitGitGadget into a GitHub Action. That means we'll need to imitate how get-azure-pipelines-artifact bundles and commits the generated code.

There appears to be a checks action that may be useful. In the initial workflow it would create a check. The id would have to be passed to the repo workflow. The repo workflow would update the check with the run status. Is that what you were thinking?

That's what I was thinking, of course worrying about the case when a run is canceled or times out and the status is not updated, but maybe there could be a janitor workflow and it all isn't too bad.

We probably need to use the GitHub App token to create and update that check run, by the way, because we would not be able to update the details_url if we used the $GITHUB_TOKEN passed to the workflow run.

webstech commented 2 years ago

Status update

I have created some actions to support this. This is completely open to changes.

There is one workflow to handle pull request related actions (create, sync, comment) and another that would be scheduled to handle what are currently scheduled actions:

- update-open-prs
- update-commit-mappings
- handle-open-prs
- handle-new-mails

TL;DR

Reference implementation.

Repositories

gitgitgadget

The changes in gitgitgadget can be seen in my repo.

The two new files are used by the actions. The current setup is to run npm pack to generate the tar file. While an npm package could be used, there is no need for a package at this time.

gggmonitored

This is a sample repository. It has two workflows. Both trigger a workflow in the ggguser repo. The workflows will dispatch a workflow to run in the ggguser repo.

runner repo

This is messy at the moment but dispaction.yml is the workflow for all actions. The other values in the repo are possible configurations to support different repos in a single runner.

gitgitgadget actions

This repo contains the two primary actions supporting gitgitgadget. They can be split into separate repos for simplicity. The packed gitgitgadget tar is installed in this repo to build the actions.

gitgitgadget config generator

This is a separate action (remnants exist in ggguser) to generate a gitgitgadget config from a model and update the values based on runtime environment variables.

lore git equivalent

Sample repo used for testing. This is used as a stand-in for the email tracking repo (lore git in the case of git).

email docker container

This repo is used for testing. It contains code to run a docker container that acts as an smtp server. The server uses test-smtp-server (the same as gitgitgadget testing).
Emails generated by /submit are committed to the email tracking repo and a reply is also committed. The docker container is published seperately and the repo contains actions to start and stop the container.

Workflow steps

The main workflow handles dispatches from the two workflows on the monitored repository (slash command and pull requests). It contains steps for running the smtp docker container for /submit commands. This would not be done in a production environment.

Environment setup

Environment variables are set in different steps based on the trigger for the workflow.

Checkout

The runner repo is checked out but may not be needed for some config scenarios (e.g. the config is downloaded from another location). The 'monitored' repo is checked out - this needs to be set up for a shallow checkout. The email repo is checked out if necessary, No npm installs or builds are needed.

Email service

This is only done for testing. A docker container is started to handle smtp processing. The config routes smtp requests from gitgitgadget to the container.

Primary action

An action is run to handle pull requests, synchronize and comments.

Misc action

This would normally be part of a scheduled workflow. It is included for test purposes to demonstrate the action in action (so to speak). The action in this case is 'handle-new-mails'.

Outstanding

dscho commented 2 years ago

@webstech I haven't found time to have a proper look at this yet, but rest assured that I am super excited about this!

webstech commented 2 years ago
  • Checking out repos in shallow setup. Need to review previous comments.

Tracking thoughts/observations after trying this. The first couple of items are raised after not using checkout and only shallow related.

Just wrapping my head around this. Is there some separation of responsibility for doing the push that a new (note related) push action could simplify or does that ignore a problem that early pushes resolve?

softworkz commented 2 years ago
  • PR specific actions fetch what they need so shallow repos should work.
  • Misc action handle-new-mails does an unqualified fetch to update the email worktree. Should it be using the email archive commit tip on the fetch. Would that work properly with a shallow setup? Looking into creating a test.

Have you tried a blobless clone like @dscho had suggested above? Here's some nice explanation of various methods: https://github.blog/2020-12-21-get-up-to-speed-with-partial-clone-and-shallow-clone/

webstech commented 2 years ago

Have you tried a blobless clone like @dscho had suggested above?

Not saying I set it up correctly but the end result was no authority to push updates to the repo on GitHub. If we can resolve that it could work. May have to look at what the checkout action does.

Here's some nice explanation of various methods: https://github.blog/2020-12-21-get-up-to-speed-with-partial-clone-and-shallow-clone/

Thanks for the link. I have a much better understanding of partial clones now.

webstech commented 2 years ago

Not saying I set it up correctly but the end result was no authority to push updates to the repo on GitHub. If we can resolve that it could work. May have to look at what the checkout action does.

Using the github-script action I brute forced setting the needed git config extraheader (ie security issues were ignored). Seems to work. I see there has been some discussion on the checkout action to support partial clones but nothing has really happened.

Misc action handle-new-mails does an unqualified fetch to update the email worktree. Should it be using the email archive commit tip on the fetch.

If the fetch specified the commit tip, could the mail repo also be init empty with the remote configured? No real need for a checkout if the needed changes are fetched?

@dscho I would appreciate any advice here. I could flounder some more but that seems pointless without some guidance. For the mail repo (aka lore/git) the minimal work (maybe not what is really possible) is to create a repo and populate it from the email tip commit to the remote HEAD. This workflow is attempting to set up the mail repo. Is there a fetch that should be done? gitgitgadget internally knows the email commit tip. Could gitgitgadget do a fetch internally to get from that point forward? It is not clear to me how to do it and still be compatible with existing code. If this works better outside of gitgitgadget, an action to make the mail commit tip an env var (action output) could be written. Thanks for any suggestions here.

webstech commented 2 years ago

PR #1055 adds the two sources supporting actions mentioned above.

webstech commented 2 years ago

@dscho Carrying on from the last comments in PR 1055. Let's start moving this into gitgitgadget.

Looking for suggested repo names. Initial suggestions are in italics.

Actions

There are three actions. This is what would be specified for uses in the workflow.

Runner

This is a repo where the dispatched and two scheduled actions workflows run. It could also contain scheduled workflows for syncing repo updates. It could also contain model configurations. This is for gitgitgadget. Other projects would have their own similar repo but probably not a clone.

Testing

There are four repos for testing.

dscho commented 2 years ago
  • The checkout action sets git config vars to allow push to be done. Initializing a shallow setup will need to do this setup as well. One published push action appears to just set the token in the url (vs adding a http header). Can that be done setting config vars for gitgitgadget pr related actions?

It could be done via secrets. And we do not necessarily need to persist the credentials as actions/checkout does. We can easily pass the token via git -c http.extraHeader=... (after marking the value as a secret, of course). Or embed the credentials in the URL via git push https://PAT:<token>@github.com/....

Could publishTagsAndNotesToRemote and project.publishToRemote support this? Looking into this but...

Yes, publishToRemote could be a full URL with embedded credentials.

* Should the push decision be external to CIHelper?  Should it just indicate if a push is needed?  The scheduled misc commands just run (and ignore the push needed returned value) and do a push outside of `misc-helper` so maybe the PR related actions should do the same.  Was there a problem that required getting the push done asap (concurrency comes to mind)?

The only times when we really need to push are the times when we mirror branches and when we persist a submitted patch series iteration by pushing a tag.

Concurrency does not worry me there. For pushing a tag, there should not be any concurrency. For mirroring, well, we'll need to be lenient and wait and try to mirror again, potentially in a loop with increasingly long delays.

  • PR specific actions fetch what they need so shallow repos should work.

Yes! From the PR info we should even know roughly how deep the shallow fetch needs to be.

  • Misc action handle-new-mails does an unqualified fetch to update the email worktree. Should it be using the email archive commit tip on the fetch. Would that work properly with a shallow setup? Looking into creating a test.

That's a bit trickier: handle-new-mails knows the previously-seen tip and needs all of the commits in between.

So I would suggest starting with depth 50 and if the previously-seen tip is not in that range, incrementally increase the depth by 50 until it is in range. (That "50" is pulled out of thin air, of course.)

  • Have not determined if the other three misc actions may have possible issues.

Since you have a working staging environment, we can easily experiment with as-shallow-as-possible fetches.

  • There may be a need for an action to perform the notes push using a provided PAT or other auth. This function could remove the need to do a push in gitgitgadget.

Many actions need to push the notes. Remember, in Git's context we always push to gitgitgadget/git's refs/notes/gitgitgadget, even if we handle PRs in git/git (that was a specific requirement by @gitster: no writes into git/git).

We need to update refs/notes/gitgitgadget when /allowing and /denying users, when /submiting and when handling new mails (to record the currently-seen tip of lore.git).

Just wrapping my head around this. Is there some separation of responsibility for doing the push that a new (note related) push action could simplify or does that ignore a problem that early pushes resolve?

That push could indeed be the responsibility of a dedicated Action.

I wonder, however, how much complexity we are leaving up to the user (i.e. the person who needs to configure this all so that it works). If the GitHub workflows that need to be added to the project that desires to use GitGitGadget are too complex in their own right, it will make GitGitGadget harder to use than I want.

It is such a bummer that GitHub Apps cannot run their own, uncommitted workflows. That would make things a ton easier. Then we could turn GitGitGadget into more of a GitHub App.

Maybe we should reconsider the entire design.

Let's take stock of the current git/git scenario: it has the gitgitgadget-git App installed, with read-only access, but webhooks being sent to the Azure Function. This Function triggers Azure Pipelines that are connected to git/git, but all pushes (and all persisted state) goes to gitgitgadget/git, which has the gitgitgadget App installed (which has read-write access).

What if we took this separation of concerns a bit further? What if we required only the gitgitgadget-git App to be installed in the target repository, and we required the gitgitgadget App to be installed in a separate repository (not even gitgitgadget/git, but maybe gitgitgadget/gitgitgadget-state or something) to hold the entire state as well as the workflows on its main branch? GitGitGadget would then ensure that the main branch's workflows are up to date, would have enough access to update things like the PR's checks, the workflows would run in that state repository, though.

But on the other hand this idea also sounds really complex to set up.

Not saying I set it up correctly but the end result was no authority to push updates to the repo on GitHub. If we can resolve that it could work. May have to look at what the checkout action does.

Using the github-script action I brute forced setting the needed git config extraheader (ie security issues were ignored). Seems to work. I see there has been some discussion on the checkout action to support partial clones but nothing has really happened.

Right, this is a bit finicky.

It would be safer, as you say, to prevent the regular GitGitGadget Action from pushing the notes ref (or the tags), and use a dedicated Action for that instead.

Misc action handle-new-mails does an unqualified fetch to update the email worktree. Should it be using the email archive commit tip on the fetch.

If the fetch specified the commit tip, could the mail repo also be init empty with the remote configured? No real need for a checkout if the needed changes are fetched?

It should be bare to begin with, and yes, it can start out as completely empty (or as a partial & shallow clone).

Looking for suggested repo names. Initial suggestions are in italics.

Actions

There are three actions. This is what would be specified for uses in the workflow.

  • Gen config: gitgitgadget/gen-config

Remind me what this is for again?

  • Handle PRs (create/update/comment): gitgitgadget/handle-pr

  • Handle scheduled misc actions (misc-action.ts): gitgitgadget/check-status

Both sound good.

Runner

This is a repo where the dispatched and two scheduled actions workflows run. It could also contain scheduled workflows for syncing repo updates. It could also contain model configurations. This is for gitgitgadget. Other projects would have their own similar repo but probably not a clone.

  • gitgitgadget/workflow-runner (gitgitgadget-workflows was mentioned above)

I guess this would be the state repository I alluded to, above.

Testing

There are four repos for testing.

  • sample repo (pretending to be gitgitgadget/git, etc.): gitgitgadget/test-git

  • sample repo (pretending to be lore git): gitgitgadget/test-mail

  • sample repo (pretending to be gitgitgadget/workflow-runner above): gitgitgadget/test-workflow-runner

  • repo for building a docker container to run an smtp server while testing /submit and /preview commands. It also has two actions for starting and stopping the smtp server: gitgitgadget/test-smtp-docker

Makes tons of sense!

Do you have a concrete idea how we can keep GitGitGadget's usage (or configuration) very simple? I have a hunch that we might get away with relying heavily on reusable workflows. That way, we could still potentially keep all of the complexity hidden away in gitgitgadget/gitgitgadget but the UI would look simple by having very simple-looking uses: gitgitgadget/<short-and-sweet-name> lines.

webstech commented 2 years ago

@dscho Thanks for the detailed response. I am reviewing it and will probably respond in pieces (but not for a couple of weeks). I have created most of the repos in gitgitgadget and am updating my forks. I will go over your comments as I do this. It includes using reusable workflows.

dscho commented 1 year ago

@webstech thank you for this great start over at gitgitgadget/check-status#1!

I wonder whether we could establish yet another repository (say, gitgitgadget/gitgitgadget-workflows) that use that Action, to run e.g. the regularly-scheduled stuff (using concurrency to avoid multiple workflow runs interfering with each other)?

Ideally, I would love for those Actions to update refs/notes/gitgitgadget only at the very end, and in a resilient way, so that another workflow (that runs a different mode of the Action) might have updated it concurrently and things still work. As in: one workflow run might call update-open-prs while another might call handle-open-prs and they both update ref/notes/gitgitgadget with the updated information at the end, and if the push fails, they fetch and try to update the information again and then push again. (This might require quite a bit of updates to gitgitgadget/gitgitgadget, I'm not sure, it might be easier to implement a merge strategy that can perform a three-way merge on the JSON if there are no conflicts?)

webstech commented 1 year ago

I wonder whether we could establish yet another repository (say, gitgitgadget/gitgitgadget-workflows) that use that Action, to run e.g. the regularly-scheduled stuff (using concurrency to avoid multiple workflow runs interfering with each other)?

Renamed gitgitgadget/workflow-runner (mentioned above) to gitgitgadget/gitgitgadget-workflows. Testing has been ongoing using https://github.com/webstech/ggguser/actions using concurrency and it seems to work.

Ideally, I would love for those Actions to update refs/notes/gitgitgadget only at the very end, and in a resilient way, so that another workflow (that runs a different mode of the Action) might have updated it concurrently and things still work. As in: one workflow run might call update-open-prs while another might call handle-open-prs and they both update ref/notes/gitgitgadget with the updated information at the end, and if the push fails, they fetch and try to update the information again and then push again. (This might require quite a bit of updates to gitgitgadget/gitgitgadget, I'm not sure, it might be easier to implement a merge strategy that can perform a three-way merge on the JSON if there are no conflicts?)

So far, using concurrency has kept workflows synchronous. That works great if everything is running on the GitHub workflow. If we want to migrate slowly, there could be problems if some workflows are still running on azure. Also, we have multiple endpoint updates for some things (/allow updates comments in PR and notes in gitgitgadget). Some of the ggg commands indicate if the notes need updating and some (handle pr comment) are more likely to push the update.

We can probably move forward before making asynch support update changes. The pain of occasional recovery may be cheaper than building resiliency for a multi-db update. Building tools for recovery may also be cheaper. Not saying we should not move to a post execution update but some of it may not be easy while supporting existing users.

dscho commented 1 year ago

Building tools for recovery may also be cheaper.

The biggest risk is probably that we might end up with duplicate PR comments, generated by two distinct workflow runs.

That looks manageable, as long as we keep a good record of the non-transaction-able actions that are performed in any given workflow run (such as: recording the precise URL of any added PR comment).

dscho commented 1 year ago

So... after a good long while of thinking about this and experimenting with a few things, I think we should not abandon the GitHub App at all.

However, I do want to change a couple of things.

One thing that is a thorn in my side is that the Azure Function that backs the GitHub App is deployed manually only, and that's a big obstacle against improving it.

I propose to imitate what I established in https://github.com/git-for-windows/gfw-helper-github-app/ in the meantime: a GitHub workflow that deploys the Azure Function whenever it changed..

That would also allow us to streamline how slash commands are handled, including a đź‘Ťreaction that provides immediate feedback when a /preview or /submit was issued.

Another recently introduced Git for Windows thing I would like to imitate in GitGitGadget: we recently introduced a workflow that runs in one repository, but "mirrors" a check-run in a different repository. Note how that is exactly what we need to move GitGitGadget away from that Azure Function.

Moving toward an automatically-deployed Azure Function would enable us to establish a similar strategy, where the submit or preview workflow (which can be the same workflow, triggered by a workflow_dispatch with one input indicating whether it should preview or submit the patch series) could live inside gitgitgadget/gitgitgadget itself, for ease of forking and keeping that fork up to date. We would need to install the GitGitGadget GitHub App also on gitgitgadget/gitgitgadget so that the App can trigger the workflow using an installation token, which is totally fine.

Another idea that would be much easier to unlock with an automatically-deployed Azure Function: we have a couple Azure Functions that essentially poll "foreign" repositories, which could be changed to the App parsing webhook events and triggering GitHub workflow runs instead.

For example, the one monitoring the lore mirror of the Git mailing list. We already maintain a mirror of that mirror and could easily install the GitGitGadget GitHub App there, in order to receive webhook events when the main branch advances, and the App could then trigger the respective workflow, without the need for polling.

We could even split out the What's Cooking? handling into a separate workflow that is only triggered when the comparison between the pre-/post-push tip commits reveals that this mail was sent (but then, it might not really be practical to figure that out in the Azure Function, as the webhook event would be unlikely to contain enough information).

I realize that it's still a bit of a complex setup after creating this diagram:

graph TD
  A{Azure Function} --> |triggers| B
  B{preview/submit GitHub workflow} --> |mirrors check-runs to| C
  B{preview/submit GitHub workflow} --> |mirrors check-runs to| D
  C{git/git} --> |sends webhook events to| A
  D{gitgitgadget/git} --> |sends webhook events to| A
  B --> |pushes tags to| D
  E{gitgitgadget/git-mailing-list-mirror} --> |sends webhook events to| A
  F{workflow handling mails from the Git mailing list} --> |updates PRs in| C
  F{workflow handling mails from the Git mailing list} --> |updates PRs in| D
  A --> |triggers| F

But it would make things relatively easy to adjust to other projects, I think, if we maintained the following things within gitgitgadget/git:

What do you think, @webstech?

dscho commented 1 year ago

it would make things relatively easy to adjust to other projects, I think, if we maintained the following things within gitgitgadget/git

I meant gitgitgadget/gitgitgadget, of course...

And in the meantime I thought about it again and came to the conclusion that it would actually make things much easier to maintain and to reuse in projects other than Git if we kept the GitHub Action in gitgitgadget/gitgitgadget, together with an always up to date pre-compiled index.js (thanks to ncc), but moved the Azure Function into its own repository, likewise the GitHub workflows. I.e. have three repositories that work together in well-defined ways.

jsoref commented 8 months ago

Another downside: the GitHub workflow would also be active in each and every fork...

One approach to this would be to have repositories set a repository (or even organization) variable to opt in: https://docs.github.com/en/actions/learn-github-actions/variables#defining-configuration-variables-for-multiple-workflows

so, the gitgitgadget org could set the org variable use-gitgitgadget to 1 and then the workflow would do:

...
jobs:
  work:
    if: ${{ vars.use-gitgitgadget }}
    ...
dscho commented 8 months ago

Another downside: the GitHub workflow would also be active in each and every fork...

One approach to this would be to have repositories set a repository (or even organization) variable to opt in: https://docs.github.com/en/actions/learn-github-actions/variables#defining-configuration-variables-for-multiple-workflows

so, the gitgitgadget org could set the org variable use-gitgitgadget to 1 and then the workflow would do:

... jobs: work: if: ${{ vars.use-gitgitgadget }} ...

We will want to use repository variables for the configuration, but if we want GitGitGadget to be useful beyond the tiny garden that is the Git project, we must not demand changes to the code base. I still think that the best way forward is to run GitGitGadget's workflows in a separate repository, so that no upstream project has to accept patches just to be able to use GitGitGadget.