Open samholmes opened 3 years ago
workaround: assume that forks are never more than 300 commits and fetch 300 commits
also duplicate of #520
If you want your script to only consider the commits in the PR, then by over fetching you'll pollute this list with commits not in the PR.
This is a necessity for most monorepo builds
I'd like to run an automated release cron job that'd analyze commits for the past N hours and create a release. Turns out that actions/checkout
will only have a single commit—the one that's triggered the corn job.
I know there's a fetch-depth
option but it pushes me away with the "all branches and all tags" statement. I really need only all commits for the default branch to be present in Git history. I wonder how actions like semantic-release-action
even work with corn releases then. It'd be silly to expect a new release for each commit.
Without being able to do it via the checkout action, here's a workaround in GHA
- name: Checkout
uses: actions/checkout@v2
- name: Checkout PR changes
run: |
# Un-shallow
git config remote.origin.fetch "+refs/heads/*:refs/remotes/origin/*"
# Deepen topic branch; checkout topic branch
git fetch origin ${{ github.event.pull_request.head.ref }} --depth=$(( ${{ github.event.pull_request.commits }} + 1 ))
git checkout ${{ github.event.pull_request.head.ref }}
# Fetch main for common origin
git fetch origin main:main --depth=50
- name: Get BRANCH_POINT
id: branch_point
run: |
# Find common ancestor between main and topic branch
BRANCH_POINT=$(git merge-base $(git rev-parse --abbrev-ref HEAD) main)
[[ -z BRANCH_POINT ]] && echo "No branch point" && exit 1
echo "::set-output name=REF::$BRANCH_POINT"
- name: List changed files
run: git diff --name-only ${{ steps.branch_point.outputs.REF }}
(Improvements welcome)
I want all the commits up to the fork point from the base branch.
You can use the github context to get how many commits belong to the PR, then fetch that depth.
To minimize commits fetched to compare two branches, a git fetch
to the other branch (eg: master
/ main
) will identify a common commit between the two branches and retrieve roughly only the commits needed to support that minimal history.
If the local commit history doesn't already have a commit from the 2nd branch being fetched, the full history (or whatever depth is requested) is fetched for that branch instead. This usually requires fetching one additional commit (the one you branched from) for the 1st branch (eg: PR branch).
Example for the PR branch to pull enough commit history to include a commit the other branch also has:
- name: 'PR commits + 1'
run: echo "PR_FETCH_DEPTH=$(( ${{ github.event.pull_request.commits }} + 1 ))" >> "${GITHUB_ENV}"
- name: 'Checkout PR branch and all PR commits'
uses: actions/checkout@v3
with:
ref: ${{ github.event.pull_request.head.ref }}
fetch-depth: ${{ env.PR_FETCH_DEPTH }}
- name: 'Fetch the other branch with enough history for a common merge-base commit'
run: |
git fetch origin ${{ github.event.pull_request.base.ref }}
It's important to set the ref to the PR head ref as above, since the default ref of this action is one extra "merge-commit" (the PR into the base branch), which will not only offset your fetch-depth by 1 additional commit needed, but possibly cause other issues (eg: with git fetch
on the base branch, and trying to get commit history that can successfully derive a merge-base).
NOTE: Merge-commits (those from the base branch into the PR branch, not the action default test merge-commit ref) contribute to the total commits from github context, and the fetch depth range would not only get N commits from your branch but also N commits associated to the history of merge commits.
You may not need the extra commit - If you have a merge commit from the base branch, then your local history will have a commit prior to the merge that it can probably use instead. Otherwise fetching an extra commit should ensure a common ancestor.
EDIT: Fetching to a common ancestor may fail when you have commits belonging to the base branch in local history which have not been merged into the PR branch. This problem occurs with the default ref (the generated test merge-commit of the PR branch into the base branch) with a fetch-depth
of 2 or more.
(Improvements welcome)
@jbreckmckye if you're interested in a file-name diff (specifically files added/changed, ignoring others like renames or deletions), here's a few examples.
This is nice and small, and should be fine AFAIK, see commented version below for more details:
- name: 'Checkout PR branch (with test merge-commit)'
uses: actions/checkout@v3
with:
fetch-depth: 2
- name: 'Get a list of changed files to process'
run: git diff-tree --no-commit-id --name-only --diff-filter 'AM' -r HEAD^1 HEAD
I've collapsed the original content (still useful maybe if you want to understand why a --merge-base
is important), as I have curated a better source of this information here.
@polarathene What I was actually after was that BRANCH_POINT
ref. The reason being, I'm using Yarn workspace tools, which take the reference to figure out which things need to be re-built / re-tested. However, the diff itself is still very relevant for things like linting.
I'm finding this is really helping me optimise my PR flow. For example, I'm working on repo that used to have a twenty-five minute PR flow for lint, unit-tests, static analysis, etc. Now I have differential builds I can run a differential lint and a subset of unit tests, and a PR build can now take as little as two minutes.
Your examples are still helpful though - I didn't realise I could parameterise the checkout action with the fetch depth, and I'd never really looked too hard at diff-tree
. Seems like exactly the command I needed.
What I was actually after was that
BRANCH_POINT
ref.
# Avoid the default ref "test merge commit" generated by Github (`github.sha`),
# Otherwise `git fetch` will fail to retrieve enough commits for a merge-base,
# The head ref (with default `fetch-depth: 1`) references the latest commit from the PR branch:
- name: 'Checkout PR branch'
uses: actions/checkout@v3
with:
ref: ${{ github.event.pull_request.head.ref }}
- name: 'Find the merge-base commit hash'
env:
branch_main: ${{ github.event.pull_request.base.ref }}
branch_topic: ${{ github.event.pull_request.head.ref }}
run: |
# Fetch enough history to find a common ancestor commit (aka merge-base):
git fetch origin ${{ env.branch_topic }} --depth $(( ${{ github.event.pull_request.commits }} + 1 ))
git fetch origin ${{ env.branch_main }}
BRANCH_POINT=$( git merge-base origin/${{ env.branch_main }} ${{ env.branch_topic }} )
echo "::set-output name=REF::$BRANCH_POINT"
Same thing really. Just remember that won't identify the commit that the PR branched from.
If there was a merge commit from base branch into the PR branch, the last commit from the base branch becomes the merge-base target instead (which is probably what you'd want, anyway).
WARNING: While using the branch refs as shown above works most of the time, it's not as deterministic for when you re-run a previous workflow job, as those branches may have new commits since, and especially for the PR branch, with a stale github context in the re-run, you risk an inaccurate --depth
to fetch. Prefer referencing the base.sha
and head.sha
instead, as demonstrated at the end of this response.
Example of when merge-base derived is not the original commit you branched the PR from:
Additional tips:
${{ github.event.pull_request.base.sha }}
could point to the commit hash of MAIN-D, if at the time the PR was opened on the repo when that was the latest commit on the main
branch.${{ github.event.pull_request.base.sha }}
value then updates to the commit hash of MAIN-E once the merge commit is made at a later point in time.ref
for the action however, the generated test merge commit is ${{ github.sha }}
, but the parent commit on main
(which pr-branch
was merged with to create the test commit) was the latest commit on main
when commit history of pr-branch
last updated. Do not mistake ${{ github.event.pull_request.base.sha }}
for that same commit (which is updated to the latest main
commit when main
is merged into pr-branch
, or when rebasing pr-branch
onto main
).fetch-depth
progress when encountered and are part of the commits
context count that Github provides. The other commit parent of each merge commit however will branch off and continue to retrieve N commits for the remaining depth as well (which can sometimes result in retrieving as far back as the root commit).I want all the commits up to the fork point from the base branch.
git merge-base
is the "obvious" solution and it works in most cases but it does NOT work in all cases. So it cannot be used as the solution in the most common action that absolutely everyone uses.
One alternative is to use https://api.github.com/repos/$gh_project/pulls/$pr_number/commits
, see example in
https://github.com/thesofproject/sof/blob/73df64444b752/.github/workflows/codestyle.yml#L38
Two problems with git merge-base
:
git merge-base
is not compatible with shallow clones. Relying on it can make you believe the entire git history has been submitted in the Pull Request. See example and diagram in https://github.com/thesofproject/linux/issues/2556 (learned the really hard way!)
git merge-base
is not compatible with shallow clones. Relying on it can make you believe the entire git history has been submitted in the Pull Request.
I was curious so gave this a look with your 2020 example script and adapted to advice I mentioned earlier in this thread about doing a shallow clone and using git merge-base
.
Your script referenced the 4.20 kernel (which later became 5.0), so I pulled in the history since then (probably could have started from 2020-10-01
(October 1st) commit date):
# Initial setup
# Expect around 3-6GB:
git clone --single-branch --shallow-since 2019-03-03 https://github.com/torvalds/linux
cd linux
git remote add sof_remote_demo https://github.com/thesofproject/linux
# This probably could be reduced in scope? I tried the shallow since but
# this step still took ridiculously long on a 1GB RAM VPS, like hours resolving delta?
# (more RAM would probably have made a big difference)
# git fetch sof_remote_demo
git fetch --shallow-since 2020-10-01 sof_remote_demo
# Tagging commits and adding a change:
# 16th Oct 2020 (authored 9th Sep): https://github.com/thesofproject/linux/commit/b150588d227ac0
git tag _real_PR_base b150588d227ac0
git checkout _real_PR_base
touch dummyfile; git add dummyfile
git commit -m 'pull request' dummyfile
git tag _pull_request
# 30th Oct 2020: https://github.com/thesofproject/linux/commit/70fe32e776dafb
git tag _target_branch 70fe32e776dafb
# 10th Oct 2020: https://github.com/torvalds/linux/commit/86f29c7442ac4b
git tag _spurious_base 86f29c7442ac4b
# Make a local clone to use the first clone as a remote instead:
# (--bare is faster / smaller by excluding the working tree, only cloning `.git/` content)
# (single commit in PR + common branching point commit, github provides this context)
git clone --bare --single-branch --branch _pull_request --depth 2 "file://$(pwd)" /tmp/linux-local
# Go through the commit history available to find a useful commit in common with target branch:
# (b150588d227ac0edc933e8f7f810d4ea9453b32c aka `_real_PR_base`)
git rev-list --first-parent --max-parents=0 --max-count=1 _pull_request
# Get date of commit (2020-10-15 16:52:12 -0500):
git log --max-count=1 --date=iso8601 --format=%cd "b150588d227ac0edc933e8f7f810d4ea9453b32c"
# Fetch the commit history from the `_target_branch` since then:
git fetch origin _target_branch --shallow-since "2020-10-15 16:52:12 -0500"
# `FETCH_HEAD` is pointing to tip of this branch, tag it:
git tag _target_branch FETCH_HEAD
# Now we can do the merge base and get the result we expect:
# (b150588d227ac0edc933e8f7f810d4ea9453b32c aka `_real_PR_base`)
git merge-base -a _target_branch _pull_request
Avoids the concern of fetching over a million commits :+1:
Just need to provide context of date or number of commits to get shared history between the two branches. A similar issue (with much smaller repo size) was discussed here recently: https://github.com/actions/checkout/issues/578#issuecomment-1588468853 (lack of common history between the two branches caused the problem for the operation)
If you don't have that extra context, you can probably clone/fetch with --shallow-since
at a date you'd consider reasonable, or likewise with --depth
which should still accomplish getting common history whilst avoiding excessive history.
I didn't quite follow your intention of the example. You chose to clone the _target_branch
with N commits, then bring in the _pull_request
commits. At least with Github Actions, it would probably make more sense to use the Github context for PR commit count (includes merge commits, so can fetch a bit more but not excessively).
Once you have that, you can derive a date from a commit that should be a common ancestor between the two branches like I had done in my linked example. If you did the clone with --shallow-since
instead of --depth
, the plain fetch command for the PR history should find a common commit too if you need to approach it that way.
you can probably clone/fetch with --shallow-since at a date you'd consider reasonable, or likewise with --depth which should still accomplish getting common history whilst avoiding excessive history.
Yes of course, if you "guess right" and fetch enough then git merge-base
will work and will return some merge-base (which may or many not be the merge-base you want).
But that's just guessing; it does not give an automated solution that works every time, it does not give you something that you can script and forget about it.
But that's just guessing; it does not give an automated solution that works every time
When does it not work when you fetch commits since 1 year ago? 5 years ago?
Obviously that's not as efficient, but you're not going to need more than that are you? If your PRs are branching off and merging back within a 3 month window, you could use that too, or if you know your PRs aren't likely to exceed 1000 commits that works well.
Depends what you're comfortable with if you for some reason are not able to provide better context.
it does not give you something that you can script and forget about it.
What is wrong with what I've shared? Github provides the commit count of a PR as context, you can use that to get the --shallow-since
date for the target branch like I described and it works very well.
With your kernel example, my approach resulted in a little over 100 commits total fetched via --shallow-since
, and the merge base was correct.
My approach does ensure you get sufficient commit history pulled in without having to bring in a full clone of a branches history (which would be wasteful with the linux kernel example if you're not truncating it with --shallow-since
or similar that you're opposed to?).
So even if you don't use git merge-base
it's still getting you all the commits of the PR branch. It can go back a bit further but not excessively compared to other solutions suggested. I'm not sure what you consider an alternative to git merge-base
though?
I know you've cited niche scenarios (criss-cross), but it's not clear what your actual goal / expectation is for handling that. What would you do differently? At the criss-cross the two branches effectively effectively have a common branch history?
It's not uncommon in some projects I've contributed to for the main branch to add merge commits into a PR branch, but you're not really going to be merging a PR branch back into main and keep treating it as the same branch/PR are you? So is that really an applicable concern?
I'm open to real-world scenarios where my approach doesn't work well, and what you believe would be better. --unshallow
fetches excess history without any benefit being cited where that makes a difference? (_if anything, you were not happy about fetching over a million commits which unless I'm mistaken is what will happen if you're remote/origin is not shallow)
When does it not work when you fetch commits since 1 year ago? 5 years ago? Obviously that's not as efficient, but you're not going to need more than that are you?
It's not just "inefficient", it's extremely slow with a large repo like the Linux kernel. This is going to systematically fetch many thousands of commits when you need only a few most of time (and a few thousands from time to time)
Plus this is still guessing some semi-random number of years.
Github provides the commit count of a PR as context, you can use that to get the --shallow-since date for the target branch like I described and it works very well.
How do you compute a date from a number of commits?!?
I know you've cited niche scenarios (criss-cross), [...] but you're not really going to be merging a PR branch back into main and keep treating it as the same branch/PR are you?
It maybe a niche scenario in Github but not in git:
Now to be honest I've heard many kernel developers (who invented git) say "don't use Github". It's not just about using git merge
, it's also that Github's design is mostly incompatible with git range-diff
https://gitlab.com/gitlab-org/gitlab/-/issues/24096
Off-topic but we've also experienced many performance issues with a kernel repo while much smaller repos in the same project were experiencing none. I think it got better.
It's not just "inefficient", it's extremely slow with a large repo like the Linux kernel. This is going to systematically fetch many thousands of commits when you need only a few most of time (and a few thousands from time to time)
What would you do differently? I've seen you suggest --unshallow
in a related issue which would retrieve far more commits than my approach?
When reproducing your 2020 example with the initial linux repo clone, I wasn't entirely sure what I needed history wise beyond 4.20
/5.0
kernel release (later inspecting the referenced commits dates, realizing I could have retrieved less). --shallow-since
worked great for that.
Here's a revised starting point if you know your actual commit/ref before and after points:
git init example && cd example
git remote add origin https://github.com/torvalds/linux
git remote add sof_remote_demo https://github.com/thesofproject/linux
# Fetch starting commit and assign a ref (branch):
git fetch sof_remote_demo --depth 1 b150588d227ac0edc933e8f7f810d4ea9453b32c:_real_PR_base
# Get commits between these two refs:
# (fetch commits since date of _real_PR_base up to a known commit that we assign ref _target_branch)
git fetch sof_remote_demo --shallow-since "2020-10-15 16:52:12 -0500" 70fe32e776dafb4b03581d62a4569f65c2f13ada:_target_branch
# 111 commits total
git rev-list _target_branch _real_PR_base | wc -l
# 292M size
du -sh
# 1.4G size after checkout:
git switch _real_PR_base
# Now make a change and add your _pull_request ref
# Optionally pull in commits from origin to include the _spurious_base
# Continue with exercise (but note there won't be millions of commits to pull from a subsequent clone due to how shallow this was)
You could have fetched _pull_request
earlier instead of _real_PR_base
if it were available, with sufficient depth to get the common commit ancestor to provide --shallow-since
date for _target_branch
. I used the commit explicitly to avoid fetching excess future history since the example is from years back, it could just be a real ref instead when suitable.
Plus this is still guessing some semi-random number of years.
You're setting a ceiling of what should be safe for the bulk of PRs in a project. There could be some outliers but often you're probably not dealing with merging a PR that branched off years ago are you? You'd probably want to rebase or similar at that point.
Many projects I've engaged with don't tend to have PRs active for such long durations. There's plenty that amass open PRs that cease to get updated, but other projects also close/reject PRs that are considered stale/abandoned in favor of opening a new one if activity will continue.
I'm not sure why we're debating this --shallow-since
/ --depth
ceiling? I suggested it as a fallback in the event you don't have the better context. For many projects they are nothing like the linux kernel repo.. regardless what is an appropriate value to choose depends on the project. You're not going to need to rely on this fallback approach with Github Actions AFAIK, but I still consider it viable when you lack context.
What would you do differently?
How do you compute a date from a number of commits?!?
You just need a common ancestor commit between the two branches, and then take the commit date from it.
Did you miss the earlier example I provided you? Or was there a misunderstanding?:
# Go through the commit history available to find a useful commit in common with target branch: # (b150588d227ac0edc933e8f7f810d4ea9453b32c aka `_real_PR_base`) git rev-list --first-parent --max-parents=0 --max-count=1 _pull_request # Get date of commit (2020-10-15 16:52:12 -0500): git log --max-count=1 --date=iso8601 --format=%cd "b150588d227ac0edc933e8f7f810d4ea9453b32c" # Fetch the commit history from the `_target_branch` since then: git fetch origin _target_branch --shallow-since "2020-10-15 16:52:12 -0500"
For a workflow example with context and vars used, I linked you to this previously: https://github.com/actions/checkout/issues/520#issuecomment-1167205721
It maybe a niche scenario in Github but not in git
I'm not disagreeing with you there, but:
actions/checkout
You're referring to I guess long-lived branches that are merging between each other? (the criss-cross)
I've not had any issues with the main/target branch being merged into the PR branch, or rebasing the PR branch. I can't think of a scenario where I've needed a PR branch that also merges into a target branch, and still continues to be iterated on instead of a separate PR branch.
I'm sure that's probably a thing as the LWN article seemed to discuss, and your projects rebase branch, but I'm not really following what you're suggesting should be done differently than the approach I've shared?
Now to be honest I've heard many kernel developers (who invented git) say "don't use Github".
That's fine. That's a very different workflow from what most users need in their projects, and the linux kernel isn't using Github for CI jobs right?
If you've got a better solution that can be used on Github, that'd be great to know. My approach to retrieving both branches relevant commit histories should meet the needs of most projects (even handled your kernel PR example with git merge-base
).
I'd also appreciate any reproducible examples of real-world scenarios where it will falter and where a better alternative that can be used instead is provided.
Did you miss the earlier example I provided you?
# Go through the commit history available to find a useful commit in common with target branch
git rev-list --first-parent --max-parents=0 --max-count=1 _pull_request
This does not make sense to me sorry. I'm merely saying that git merge-base is not compatible with shallow clones and you keep giving commands that only work in deep or "deep enough" clones for some hardcoded variants of "deep enough". A hypothetical Github Action that solves the problem stated in the description at the top of this issue should work for every PR in every repo.
Also, you keep posting very long comments which makes it very hard to extract the bits that matter.
you keep posting very long comments which makes it very hard to extract the bits that matter.
I was already making an effort to be terse :sweat_smile:
This does not make sense to me sorry.
I can't meet your request to be terse and break this stuff down further for you. The command does what is needed to find an appropriate commit.
git merge-base
is not compatible with shallow clones
Not compatible when there is insufficient history. My approach gets you sufficient history, maybe try it out?
you keep giving commands that only work in deep or "deep enough" clones for some hardcoded variants of "deep enough".
I hard-coded them for the examples. I mentioned using github context is better and linked to an example, but I wanted to try keep it simple for you.
Here's how you get the "deep enough" commits:
- name: 'PR commits + 1'
run: echo "PR_FETCH_DEPTH=$(( ${{ github.event.pull_request.commits }} + 1 ))" >> "${GITHUB_ENV}"
- name: 'Checkout PR branch and all PR commits'
uses: actions/checkout@v3
with:
ref: ${{ github.event.pull_request.head.sha }}
fetch-depth: ${{ env.PR_FETCH_DEPTH }}
A hypothetical Github Action that solves the problem stated in the description at the top of this issue should work for every PR in every repo.
Cool, well until one actually exists, my Github Actions workflow example can be adapted: https://github.com/actions/checkout/issues/520#issuecomment-1167205721
I can't meet your request to be terse and break this stuff down further for you.
I never asked you to "break down" anything, quite the opposite.
It maybe a niche scenario in Github but not in git
I'm not disagreeing with you there, but: - This is a github issue for a Github CI action actions/checkout
Guess what: Github lets you import ANY git repo. Maybe Github itself should say "don't import the Linux kernel" and not just kernel developers?
That's a very different workflow from what most users need in their projects,..
Totally agreed. Yet an official git action should work all the time. Or at the very least be ready for all cases and fail gracefully when it can't do.
... and the linux kernel isn't using Github for CI jobs right?
This entire discussion started with you questioning my data posted in https://github.com/thesofproject/linux/issues/2556 ?!?
I hard-coded them for the examples. I mentioned using github context is better and linked to an example, but I wanted to try keep it simple for you.
(Counter-)examples can only show when something does not work; examples can never show that a generic solution always works. “Program testing can be used to show the presence of bugs, but never to show their absence” (Dijkstra).
Yes, examples are very useful as a "simple" way to introduce concepts new to some people but they're just noise when trying to prove that something always works. An example is never an answer to a counter example. BTW someone showing a counter example is obviously not new to a topic.
Finally, --shallow-since
never made any sense; with or without hardcoded examples.
the command does what is needed to find an appropriate commit.
Which command? you posted so many of them.
PR_FETCH_DEPTH=$(( ${{ github.event.pull_request.commits }} + 1 ))
fetch-depth: ${{ env.PR_FETCH_DEPTH }}
Ah, this command/approach. Thanks for finally dropping all the noise about hardcoded commits, hardcoded number of years and pointless --shallow-since
and for focusing back on some actually generic solution.
So now I remember that I did try this approach and I agree it does work every time to find one merge-base. It's great for the very large majority of Github users - but not for all people / a generic Github Action.
That's because it's massively inefficient for git repos that use git merge
, so it's not "shallow" anymore.
it would probably make more sense to use the Github context for PR commit count (includes merge commits, so can fetch a bit more but not excessively).
A LOT more actually for a project that uses git merge a lot. In this example below, git fetch origin --depth 806
downloads 1.1 million commits which is SLOWER than downloading everything.
curl -L -H "Accept: application/vnd.github+json" -H "X-GitHub-Api-Version: 2022-11-28" \
https://api.github.com/repos/thesofproject/linux/pulls/4393
=> 806 commits
cd test-repo
git init
git remote add origin https://github.com/thesofproject/linux/
# 1 commit, 45 seconds
time git fetch origin --depth 1 2f0c3e8b882303de468060e35769c0f79465f848
# (start over)
# Full clone, 7 minutes
time git fetch origin 2f0c3e8b882303de468060e35769c0f79465f848
# (start over)
# 10 minutes, LONGER than a full clone!
time git fetch origin --depth 806 2f0c3e8b882303de468060e35769c0f79465f848
git log --oneline FETCH_HEAD | wc -l
=> 1.1 million commits!
If you've got a better solution that can be used on Github, that'd be great to know.
The more efficient approach that I did implement and already linked to is to iterate and git fetch
all the commits in the PR one by one. If you start with the "older" commits then most of the other git fetch should be no-ops.
~I've implemented this method and put it in production~ but only in a small repo. I haven't used it "in anger" on a Linux kernel repo yet. The API used in the current implementation has a 250 commits limit. This may or may not be what you want/need.
EDIT: correction, that implementation isn't finished yet, it's still using an inefficient count and depth for now.
I never asked you to "break down" anything, quite the opposite.
You said you didn't understand a command. I responded that if I elaborated, it wouldn't be terse like you requested.
Guess what: Github lets you import ANY git repo.
There is a lot of things you can do in life, doesn't mean you should. To remain terse (and on topic), this isn't worth debating.
an official git action should work all the time. Or at the very least be ready for all cases and fail gracefully when it can't do.
One would hope for the best yes.
In my experience it doesn't matter how well funded a company is, these issues lasting for years is common to encounter, even if they're open-source and solutions are provided via PR it can take quite some time before action is taken.
This entire discussion started with you questioning my data ?!?
Yes? I was curious, so I tried my approach with git merge-base
and it resulted in a shallow clone of a little over 100 commits and working git merge-base
:man_shrugging:
Most of the discussion after my first response was repeating myself because you wouldn't read what was said the first time :roll_eyes:
but they're just noise when trying to prove that something always works.
Clearly I communicated that poorly and could have done better... or it'd still be glossed over and variables considered noise/overhead to explain. Can't really predict that up front :man_shrugging:
--shallow-since
never made any sense; with or without hardcoded examples.
Which command? you posted so many of them.
The git rev-list
one you quoted and said didn't make sense? :roll_eyes:
Ah, this command/approach. Thanks for finally dropping all the noise about hardcoded commits, hardcoded number of years and pointless
--shallow-since
and for focusing back on some actually generic solution.
That was only for getting the number of commits.
I shared this via a link in my very first sentence to you: https://github.com/actions/checkout/issues/552#issuecomment-1588754847
The rest was about bringing in the other branch for the git merge-base
while avoiding over a million commits fetched.
If you start with the "older" commits then most of the other git fetch should be no-ops.
I'm not sure how you propose doing that with the PR you referenced?
This is probably something you'd know the answer to. Presumably all the commits are there still, but some are already applied / shared further back in history?
The more efficient approach that I did implement and already linked to is to iterate and git fetch all the commits in the PR one by one.
I'm not sure how your staggered fetching would handle the above experience any better? :thinking:
You said you've linked to it but all I see is git fetch --depth
with an awkward way to retrieve the depth instead of using context? (presumably due to Travis, before GH Actions)
EDIT: correction, that implementation isn't finished yet, it's still using an inefficient count and depth for now.
So you're fetching over a million commits still, while saying my advice is inefficient? (with PR 4393
, since you chose that outlier)
Most of the PRs on the thesofproject/linux
aren't these massive piles of commits, those are special case and could be handled in a separate manner if preferred.
The fallback advice I gave with --shallow-since
and a relative offset is probably applicable to your large PRs (with the cascading merges).
# 1 minute (324 MiB) - Approx 20k commits fetched:
$ time git fetch origin --no-tags --prune --no-recurse-submodules --shallow-since "2023-02-01" \
'+5224191371f701bbf0e10bb49bd5cd3fda9848a5:remotes/origin/merge/sound-upstream-20230530'
real 1m1.267s
user 0m27.861s
sys 0m3.801s
# 40 sec (237 MiB):
$ time git fetch origin --no-tags --prune --no-recurse-submodules --depth 1 \
'+5224191371f701bbf0e10bb49bd5cd3fda9848a5:remotes/origin/merge/sound-upstream-20230530'
# 45 minutes (1GB VPS instance, memory starved by git) - Over a million commits fetched:
$ time git fetch origin --no-tags --prune --no-recurse-submodules --depth 806 \
'+5224191371f701bbf0e10bb49bd5cd3fda9848a5:remotes/origin/merge/sound-upstream-20230530'
As you can see close to 6 months isn't too much overhead, you'd probably end up with only double the time of a single commit fetch, despite fetching tens of thousands.
For kernel cycles, I'd imagine a relative date offset for fetching like this is predictable enough to know if 6 months is a large enough window for you.
How do I tell
actions/checkout
to fetch all the commits up to the fork point? I want all the commits up to the fork point from the base branch. I don't see any examples on how to do this in the README. I'd assume this is a common need.