carpentries / workbench

Repository for Discussions and Materials about The Carpentries Workbench
https://carpentries.github.io/workbench/
Creative Commons Attribution 4.0 International
17 stars 7 forks source link

[post transition] Build and Deployment workflow failure by the first run #66

Closed rogerkuou closed 2 weeks ago

rogerkuou commented 1 year ago

When executing the 01 Build the full site workflow, it fails on creating the gh-pages branch.

Some observations:

Example runs:

Can something be done to solve or document this issue? Thanks!

Attahed error message of the failed execution:

Running git remote set-branches origin md-outputs
  Running git fetch origin md-outputs
  From https://github.com/carpentries-incubator/geospatial-python
   * branch            md-outputs -> FETCH_HEAD
   * [new branch]      md-outputs -> origin/md-outputs
  Running git remote set-branches origin '*'
Add worktree for origin/md-outputs in site/built
  Running git worktree add --track -B md-outputs \
    /home/runner/work/geospatial-python/geospatial-python/site/built \
    origin/md-outputs
  Preparing worktree (new branch 'md-outputs')
  branch 'md-outputs' set up to track 'origin/md-outputs'.
  HEAD is now at 5b97[68](https://github.com/carpentries-incubator/geospatial-python/actions/runs/5853131081/job/15866549210#step:7:74)a markdown source builds
Reset Lesson
  Running git rm -rf --quiet .
Build Markdown Sources
Commit Markdown Sources
  nothing to commit on md-outputs!
Fetch origin/gh-pages
  Running git remote set-branches origin gh-pages
  Running git fetch origin gh-pages
  fatal: couldn't find remote ref gh-pages
  Error in `callr::run("git", c(...), echo_cmd = echo_cmd, echo = echo, error_on_status = error_on_status)`:
  ! System command 'git' failed
  ---
  Exit status: 128
  stdout & stderr: <printed>
  ---
  Backtrace:
   1. sandpaper:::ci_deploy(reset = reset)
   2. sandpaper:::ci_build_site(path, branch = site_branch, md = md_branch, remote = remote,…
   3. withr::with_dir(path, { …
   4. base::force(code)
   5. sandpaper:::git_worktree_setup(path, html, branch = branch, remote = remote)
   6. withr::with_dir(path, { …
   7. base::force(code)
   8. sandpaper:::git_fetch_one_branch(remote, branch, repo = path)
   9. sandpaper:::git("fetch", remote, branch)
  10. callr::run("git", c(...), echo_cmd = echo_cmd, echo = echo, error_on_statu…
  11. processx:::throw(new_process_error(res, call = sys.call(), echo = echo, …
  Running git remote set-branches origin '*'
  Running git worktree remove --force \
    /home/runner/work/geospatial-python/geospatial-python/site/built
  Execution halted
  Error: Process completed with exit code 1.
zkamvar commented 1 year ago

Thank you for providing this detailed example of this failure mode. I really appreciate you providing a copy of the logs because this helps me understand a bit what's going on. A broad overview of the deployment process for workbench lessons can be found in the Workbench developer's guide.

That being said, this issue is specific to the state of the lesson post-transition and is a bit complex. After I was notified of the discontinuation of funding for my position, my priorities shifted siginificantly and I haven't been able to neatly finalize the documentation for this transition (the documentation that does exist can be found by searching for -workflow.md in the lesson transition repository (https://github.com/search?q=repo%3Acarpentries%2Flesson-transition%20workflow.md&type=code). Ultimately, you do not have to worry about any more failing builds due to this issue.

From what I understand, here is the timeline (from the activity feed)

  1. 2023-08-08 07:36 UTC --- https://github.com/carpentries-incubator/geospatial-python/pull/158 was the PR that updated the lesson to the workbench, but it was merged into the gh-pages branch.
  2. 2023-08-08 07:39 UTC --- the gh-pages branch was copied to main (I believe), but the initial build fails because the gh-pages branch is not empty and it wasn't built with The Workbench (in fact it is a duplicate of the main branch).
  3. 2023-08-11 14:17 UTC --- the gh-pages branch is deleted
  4. 2023-08-11 14:26 UTC --- the gh-pages branch is recreated from main
  5. 2023-08-11 14:40 UTC --- https://github.com/carpentries-incubator/geospatial-python/pull/159 is merged into main and fails because the gh-pages is not empty and not created by the workbench
  6. 2023-08-14 07:25 UTC --- the gh-pages branch is deleted
  7. 2023-08-14 07:28 UTC --- the rebuild of the workbench fails because the git process can still detect the gh-pages branch (maybe due to a delay in GitHub's networking? I'm honestly not sure of why)
  8. 2023-08-14 08:15 UTC --- the rebuild of the workbench succeeds

During the build process, we check if a remote exists and if it doesn't exist, we build a new branch. I think the reason this did not work during the first manual build today was likely because of a cache of the runner's git remote branch list vs the actual remote branch list. When we checked that the remote existed (with git ls-remote --quiet --exit-code origin gh-pages), git responded responded with an exit code of 0, which meant that it did exist, but in reality, you had deleted it a few minutes before, so when it attempted to fetch the contents of that branch, it failed with fatal: couldn't find remote ref gh-pages. The re-run you did an hour later gave the correct response that the gh-pages branch did not exist and the run was able to create it.

@rbavery had contacted me on 2023-08-11 about the build failures when the main branch and gh-pages branch were duplicated. This was my response:

This is because the gh-pages branch also somehow has a workbench lesson inside it. This will need to be deleted and replaced with an orphan branch.

Normally, in fresh repositories, this branch is created automatically, but because there was a history of working with gh-pages on this repo, it is not so easy to rename and replace because GitHub sees that there used to be a branch called gh-pages and won't allow GitHub actions to recreate it.

In the transition workflow, we rename the branch to be legacy/gh-pages (https://github.com/carpentries/lesson-transition/blob/0200c85c9b3eaeb7ccb640e6d87f1930d949207d/functions.R#L507-L511) afterwards, we have to create a new orphan branch called gh-pages, add a workflow that will auto-close any PRs to that branch (https://github.com/carpentries/lesson-transition/blob/main/close-pr.yaml) and then force it up (https://github.com/carpentries/lesson-transition/blob/0200c85c9b3eaeb7ccb640e6d87f1930d949207d/functions.R#L570-L592).

rogerkuou commented 1 year ago

Hi @zkamvar, thanks a lot for the quick and detailed reply! Indeed, the strange behavior happens at the Step7 as you mentioned in your timeline, where gh-pages branch should have been removed but somehow still detected. To give a bit more detail on that, I renamed the gh-pages branch to archive-gh-pages. But I do not think this should matter. Thanks again for taking care of this! Looking forward to more updates.

zkamvar commented 1 year ago

I renamed the gh-pages branch to archive-gh-pages. But I do not think this should matter.

I think this is the key point. This is the same behaviour that we see when rename gh-pages to legacy/gh-pages in the transition workflow, which is why we force-push an orphan branch. It's good to know that it updates itself in ~1 hour if the force-push doesn't work.

I don't think you should experience any more problems with the transition and I will make sure this is included in the documentation.