gitgitgadget / git

GitGitGadget's Git fork. Open Pull Requests here to submit them to the Git mailing list
https://gitgitgadget.github.io/
Other
208 stars 135 forks source link

git for-each-ref: is-base atom and base branches #1768

Closed derrickstolee closed 1 week ago

derrickstolee commented 1 month ago

This change introduces a new 'git for-each-ref' atom, 'is-base', in a very similar way to the 'ahead-behind' atom. As detailed carefully in the first change, this is motivated by the need to detect the concept of a "base branch" in a repository with multiple long-lived branches.

This change is motivated by a third-party tool created to make this detection with the same optimization mechanism, but using a much slower technique due to the limitations of the Git CLI not presenting this information. The existing algorithm involves using git rev-list --first-parent -<N> in batches for the collection of considered references, comparing those lists, and increasing <N> as needed until finding a collision. This new use of 'git for-each-ref' will allow determining this mechanism within a single process and walking a minimal number of commits.

There are benefits to users both on client-side and server-side. In an internal monorepo, this base branch detection algorithm is used to determine a long-lived branch based on the HEAD commit, mapping to a group within the organizational structure of the repository, which determines a set of projects that the user will likely need to build; this leads to automatically selecting an initial sparse-checkout definition based on the build dependencies required. An upcoming feature in Azure Repos will use this algorithm to automatically create a pull request against the correct target branch, reducing user pain from needing to select a different branch after a large commit diff is rendered against the default branch. This atom unlocks that ability for Git hosting services that use Git in their backend.

Thanks, -Stolee

Updates in v2

Updates in v3

cc: gitster@pobox.com cc: vdye@github.com

gitgitgadget[bot] commented 1 month ago

There are issues in commit 3ff144e5f7d8790076688aa26c07ce5cb650df87: commit-reach: add get_branch_base_for_tip Lines in the body of the commit messages should be wrapped between 60 and 76 characters. Indented lines, and lines without whitespace, are exempt

gitgitgadget[bot] commented 1 month ago

There are issues in commit 22b72abff3d57cd1a7efd0ad723bbd95e257851a: for-each-ref: add 'is-base' token Lines in the body of the commit messages should be wrapped between 60 and 76 characters. Indented lines, and lines without whitespace, are exempt

gitgitgadget[bot] commented 1 month ago

There are issues in commit 3ff144e5f7d8790076688aa26c07ce5cb650df87: commit-reach: add get_branch_base_for_tip Lines in the body of the commit messages should be wrapped between 60 and 76 characters. Indented lines, and lines without whitespace, are exempt

gitgitgadget[bot] commented 1 month ago

There are issues in commit 6b1e6c4f0606785ed554f764c6f8238d40aadec2: for-each-ref: add 'is-base' token Lines in the body of the commit messages should be wrapped between 60 and 76 characters. Indented lines, and lines without whitespace, are exempt

gitgitgadget[bot] commented 1 month ago

There are issues in commit 23f18f1bef7425e81c550f223fadebc7a6fa7e6c: commit-reach: add get_branch_base_for_tip Lines in the body of the commit messages should be wrapped between 60 and 76 characters. Indented lines, and lines without whitespace, are exempt

gitgitgadget[bot] commented 1 month ago

There are issues in commit 9ccfdd69b5fc50861f9c2de85dd3ff88bf762487: for-each-ref: add 'is-base' token Lines in the body of the commit messages should be wrapped between 60 and 76 characters. Indented lines, and lines without whitespace, are exempt

gitgitgadget[bot] commented 1 month ago

There are issues in commit 7631c3de49beb63657baf0435f8dcff35b29cc77: p1500: add is-base performance tests Lines in the body of the commit messages should be wrapped between 60 and 76 characters. Indented lines, and lines without whitespace, are exempt

derrickstolee commented 1 month ago

/submit

gitgitgadget[bot] commented 1 month ago

Submitted as pull.1768.git.1722550226.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1768/derrickstolee/target-ref-v1

To fetch this version to local tag pr-1768/derrickstolee/target-ref-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1768/derrickstolee/target-ref-v1
gitgitgadget[bot] commented 1 month ago

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> Derrick Stolee (3):
>   commit-reach: add get_branch_base_for_tip
>   for-each-ref: add 'is-base' token
>   p1500: add is-base performance tests
>
>  commit-reach.c              | 118 ++++++++++++++++++++++++++++++++++++
>  commit-reach.h              |  17 ++++++
>  ref-filter.c                |  78 +++++++++++++++++++++++-
>  ref-filter.h                |  15 +++++
>  t/helper/test-reach.c       |   2 +
>  t/perf/p1500-graph-walks.sh |  31 ++++++++++
>  t/t6600-test-reach.sh       |  94 ++++++++++++++++++++++++++++
>  7 files changed, 354 insertions(+), 1 deletion(-)

I was expecting to see an documentation update to for-each-ref (and
probably branch and tag) so that what this new atom means.  Is it
that %(is-base:<commit>) interpolates to <commit> for a ref that is an
ancestor of <commit>, and interpolates to an empty string for a ref
that is not, or something?

Thanks.
gitgitgadget[bot] commented 1 month ago

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 8/1/24 7:06 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> >> Derrick Stolee (3):
>>    commit-reach: add get_branch_base_for_tip
>>    for-each-ref: add 'is-base' token
>>    p1500: add is-base performance tests
>>
>>   commit-reach.c              | 118 ++++++++++++++++++++++++++++++++++++
>>   commit-reach.h              |  17 ++++++
>>   ref-filter.c                |  78 +++++++++++++++++++++++-
>>   ref-filter.h                |  15 +++++
>>   t/helper/test-reach.c       |   2 +
>>   t/perf/p1500-graph-walks.sh |  31 ++++++++++
>>   t/t6600-test-reach.sh       |  94 ++++++++++++++++++++++++++++
>>   7 files changed, 354 insertions(+), 1 deletion(-)
> > I was expecting to see an documentation update to for-each-ref (and
> probably branch and tag) so that what this new atom means.  Is it
> that %(is-base:<commit>) interpolates to <commit> for a ref that is an
> ancestor of <commit>, and interpolates to an empty string for a ref
> that is not, or something?

You are absolutely right that I missed this crucial detail. I will
eventually send a v2 with this oversight corrected. For now, please
consider this documentation diff, and I look forward to other review
comments that I can use to improve this series before sending v2.

diff --git a/Documentation/git-for-each-ref.txt b/Documentation/git-for-each-ref.txt
index c1dd12b93cf..5154ba3e2a7 100644
--- a/Documentation/git-for-each-ref.txt
+++ b/Documentation/git-for-each-ref.txt
@@ -264,6 +264,16 @@ ahead-behind:<committish>::
    commits ahead and behind, respectively, when comparing the output
    ref to the `<committish>` specified in the format.

+is-base:<committish>::
+   In at most one row, `(<committish>)` will appear to indicate the ref
+   that minimizes the number of commits in the first-parent history of
+   `<committish>` and not in the first-parent history of the ref. Ties
+   are broken by favoring the earliest ref in the list. Note that this
+   token will not appear if the first-parent history of `<committish>`
+   does not intersect the first-parent histories of the filtered refs.
+   This can be used as a heuristic to guess which of the filtered refs
+   was used as the base of the branch that produced `<committish>`.
+
 describe[:options]::
    A human-readable name, like linkgit:git-describe[1];
    empty string for undescribable commits. The `describe` string may
gitgitgadget[bot] commented 1 month ago

On the Git mailing list, Junio C Hamano wrote (reply to this):

Derrick Stolee <stolee@gmail.com> writes:

> consider this documentation diff, and I look forward to other review
> comments that I can use to improve this series before sending v2.
>
> diff --git a/Documentation/git-for-each-ref.txt
> b/Documentation/git-for-each-ref.txt
> index c1dd12b93cf..5154ba3e2a7 100644
> --- a/Documentation/git-for-each-ref.txt
> +++ b/Documentation/git-for-each-ref.txt
> @@ -264,6 +264,16 @@ ahead-behind:<committish>::
>   commits ahead and behind, respectively, when comparing the output
>   ref to the `<committish>` specified in the format.
>
> +is-base:<committish>::
> + In at most one row, `(<committish>)` will appear to indicate the ref
> + that minimizes the number of commits in the first-parent history of
> + `<committish>` and not in the first-parent history of the ref. Ties
> + are broken by favoring the earliest ref in the list. Note that this
> + token will not appear if the first-parent history of `<committish>`
> + does not intersect the first-parent histories of the filtered refs.
> + This can be used as a heuristic to guess which of the filtered refs
> + was used as the base of the branch that produced `<committish>`.
> +

OK.  Knowing what definition you used is crucial when reading the
implementation, as we cannot tell what you wanted to implement
without it ;-)

Thanks.
gitgitgadget[bot] commented 1 month ago

On the Git mailing list, Junio C Hamano wrote (reply to this):

Junio C Hamano <gitster@pobox.com> writes:

>> +is-base:<committish>::
>> +    In at most one row, `(<committish>)` will appear to indicate the ref
>> +    that minimizes the number of commits in the first-parent history of
>> +    `<committish>` and not in the first-parent history of the ref.

This was a bit too dense for me to grok.  So if I have a <commit>
that is at the tip of a branch B forked from 'master', and then
'master' advanced by a lot since the branch forked, the number this
is minimizing for 'master' is the commits on the branch B, but when
showing 'maint', then even though the branch B may have the tip of
'maint' as an ancestor, the number for 'maint' would be a lot more
than the number for 'master'.  If there were another branch C that
was forked from 'master' and shared some (or all) commits that are
near the tip of branch B, e.g.

    ---o---o---o---o---o---o---o---o---o 'master'
            \
             o---o---o---o 'C'
                      \    
                       o---o---o---o 'B'

then the number may be even smaller for branch 'C' than 'master'.

And for at most one ref, %(is-base:<commit>) becomes "(<commit>)";
for all other refs, it becomes an empty string.

OK.

> OK.  Knowing what definition you used is crucial when reading the
> implementation, as we cannot tell what you wanted to implement
> without it ;-)
>
> Thanks.
gitgitgadget[bot] commented 1 month ago

This patch series was integrated into seen via https://github.com/git/git/commit/6021df112fea9dd647456b2e390f5e90297eba1a.

gitgitgadget[bot] commented 1 month ago

This patch series was integrated into seen via https://github.com/git/git/commit/cff39cd9ce85bb17e98c45ea8ade521984f5293e.

gitgitgadget[bot] commented 1 month ago

This branch is now known as ds/for-each-ref-is-base.

gitgitgadget[bot] commented 1 month ago

This patch series was integrated into seen via https://github.com/git/git/commit/691215959c62724b3de224fd536fd31895241b3a.

gitgitgadget[bot] commented 1 month ago

This patch series was integrated into seen via https://github.com/git/git/commit/813637c2d7d72a72ff8b8b7a6cdee408515d7030.

gitgitgadget[bot] commented 1 month ago

There was a status update in the "New Topics" section about the branch ds/for-each-ref-is-base on the Git mailing list:

'git for-each-ref' learned a new "--format" atom to find the branch
that the history leading to a given commit "%(is-base:<commit>)" is
likely based on.

Expecting a reroll.
source: <pull.1768.git.1722550226.gitgitgadget@gmail.com>
gitgitgadget[bot] commented 1 month ago

This patch series was integrated into seen via https://github.com/git/git/commit/2de25dbef30fd75f97b053d7c4fb47e0350aa172.

gitgitgadget[bot] commented 1 month ago

This patch series was integrated into seen via https://github.com/git/git/commit/f1df88815b5cfeb21e3128a10026e1071c8422b6.

gitgitgadget[bot] commented 1 month ago

This patch series was integrated into seen via https://github.com/git/git/commit/259918faf79416e1429d399aedf42c1af2a8d097.

gitgitgadget[bot] commented 1 month ago

There was a status update in the "Cooking" section about the branch ds/for-each-ref-is-base on the Git mailing list:

'git for-each-ref' learned a new "--format" atom to find the branch
that the history leading to a given commit "%(is-base:<commit>)" is
likely based on.

Expecting a reroll.
source: <pull.1768.git.1722550226.gitgitgadget@gmail.com>
gitgitgadget[bot] commented 1 month ago

This patch series was integrated into seen via https://github.com/git/git/commit/21eb9b8d869597bd461007918ebb67a3ea790db9.

derrickstolee commented 4 weeks ago

/submit

gitgitgadget[bot] commented 4 weeks ago

Submitted as pull.1768.v2.git.1723397687.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1768/derrickstolee/target-ref-v2

To fetch this version to local tag pr-1768/derrickstolee/target-ref-v2:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1768/derrickstolee/target-ref-v2
gitgitgadget[bot] commented 3 weeks ago

This patch series was integrated into seen via https://github.com/git/git/commit/4ac4c68ea28deb54d164cbe18d497a9b8aadcd25.

gitgitgadget[bot] commented 3 weeks ago

There was a status update in the "Cooking" section about the branch ds/for-each-ref-is-base on the Git mailing list:

'git for-each-ref' learned a new "--format" atom to find the branch
that the history leading to a given commit "%(is-base:<commit>)" is
likely based on.

Comments?
source: <pull.1768.v2.git.1723397687.gitgitgadget@gmail.com>
derrickstolee commented 3 weeks ago

/submit

gitgitgadget[bot] commented 3 weeks ago

Submitted as pull.1768.v3.git.1723631490.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1768/derrickstolee/target-ref-v3

To fetch this version to local tag pr-1768/derrickstolee/target-ref-v3:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1768/derrickstolee/target-ref-v3
gitgitgadget[bot] commented 3 weeks ago

There was a status update in the "Cooking" section about the branch ds/for-each-ref-is-base on the Git mailing list:

'git for-each-ref' learned a new "--format" atom to find the branch
that the history leading to a given commit "%(is-base:<commit>)" is
likely based on.

Comments?
source: <pull.1768.v3.git.1723631490.gitgitgadget@gmail.com>
gitgitgadget[bot] commented 3 weeks ago

There was a status update in the "Cooking" section about the branch ds/for-each-ref-is-base on the Git mailing list:

'git for-each-ref' learned a new "--format" atom to find the branch
that the history leading to a given commit "%(is-base:<commit>)" is
likely based on.

Comments?
source: <pull.1768.v3.git.1723631490.gitgitgadget@gmail.com>
gitgitgadget[bot] commented 3 weeks ago

This patch series was integrated into seen via https://github.com/git/git/commit/58280ea4dc0ddb966902792a91607d7d46f70314.

gitgitgadget[bot] commented 2 weeks ago

This patch series was integrated into seen via https://github.com/git/git/commit/996b87a4a134b147ed57647924f7ed9a2e295368.

gitgitgadget[bot] commented 2 weeks ago

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> There are benefits to users both on client-side and server-side. In an
> internal monorepo, this base branch detection algorithm is used to determine
> a long-lived branch based on the HEAD commit, mapping to a group within the
> organizational structure of the repository, which determines a set of
> projects that the user will likely need to build; this leads to
> automatically selecting an initial sparse-checkout definition based on the
> build dependencies required. An upcoming feature in Azure Repos will use
> this algorithm to automatically create a pull request against the correct
> target branch, reducing user pain from needing to select a different branch
> after a large commit diff is rendered against the default branch. This atom
> unlocks that ability for Git hosting services that use Git in their backend.

Thanks for an update.  This iteration looks good to me.
gitgitgadget[bot] commented 2 weeks ago

There was a status update in the "Cooking" section about the branch ds/for-each-ref-is-base on the Git mailing list:

'git for-each-ref' learned a new "--format" atom to find the branch
that the history leading to a given commit "%(is-base:<commit>)" is
likely based on.

Will merge to 'next'.
source: <pull.1768.v3.git.1723631490.gitgitgadget@gmail.com>
gitgitgadget[bot] commented 2 weeks ago

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 8/19/24 3:52 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> >> There are benefits to users both on client-side and server-side. In an
>> internal monorepo, this base branch detection algorithm is used to determine
>> a long-lived branch based on the HEAD commit, mapping to a group within the
>> organizational structure of the repository, which determines a set of
>> projects that the user will likely need to build; this leads to
>> automatically selecting an initial sparse-checkout definition based on the
>> build dependencies required. An upcoming feature in Azure Repos will use
>> this algorithm to automatically create a pull request against the correct
>> target branch, reducing user pain from needing to select a different branch
>> after a large commit diff is rendered against the default branch. This atom
>> unlocks that ability for Git hosting services that use Git in their backend.
> > Thanks for an update.  This iteration looks good to me.

Thank you for your careful review.

-Stolee
gitgitgadget[bot] commented 2 weeks ago

This patch series was integrated into seen via https://github.com/git/git/commit/918c769e91326606037ee062b7ad162a296af999.

gitgitgadget[bot] commented 2 weeks ago

This patch series was integrated into next via https://github.com/git/git/commit/dd5da48f4026897a41d24d9ad8829bc613bb38dc.

gitgitgadget[bot] commented 2 weeks ago

This patch series was integrated into seen via https://github.com/git/git/commit/d9c0d9081108852b533b401bd124b6dfcdb30ec5.

gitgitgadget[bot] commented 2 weeks ago

This patch series was integrated into seen via https://github.com/git/git/commit/c42f26f4b46d4128463c03b6efdfad46de1672fd.

gitgitgadget[bot] commented 2 weeks ago

There was a status update in the "Cooking" section about the branch ds/for-each-ref-is-base on the Git mailing list:

'git for-each-ref' learned a new "--format" atom to find the branch
that the history leading to a given commit "%(is-base:<commit>)" is
likely based on.

Will merge to 'master'.
source: <pull.1768.v3.git.1723631490.gitgitgadget@gmail.com>
gitgitgadget[bot] commented 2 weeks ago

This patch series was integrated into seen via https://github.com/git/git/commit/36102ce5fa35558c20ff343f71b0b72466bbf496.

gitgitgadget[bot] commented 2 weeks ago

This patch series was integrated into seen via https://github.com/git/git/commit/f271dbd491f9f16b084a1627a75a2d7ca138ff42.

gitgitgadget[bot] commented 2 weeks ago

There was a status update in the "Cooking" section about the branch ds/for-each-ref-is-base on the Git mailing list:

'git for-each-ref' learned a new "--format" atom to find the branch
that the history leading to a given commit "%(is-base:<commit>)" is
likely based on.

Will merge to 'master'.
source: <pull.1768.v3.git.1723631490.gitgitgadget@gmail.com>
gitgitgadget[bot] commented 2 weeks ago

This patch series was integrated into seen via https://github.com/git/git/commit/a8e7fada7eb5f5f827218cd8f4537a8911c267a8.

gitgitgadget[bot] commented 1 week ago

This patch series was integrated into seen via https://github.com/git/git/commit/3222718ad7d9dfc50e31037bf2f3977d2071fb75.

gitgitgadget[bot] commented 1 week ago

This patch series was integrated into master via https://github.com/git/git/commit/3222718ad7d9dfc50e31037bf2f3977d2071fb75.

gitgitgadget[bot] commented 1 week ago

This patch series was integrated into next via https://github.com/git/git/commit/3222718ad7d9dfc50e31037bf2f3977d2071fb75.

gitgitgadget[bot] commented 1 week ago

Closed via 3222718ad7d9dfc50e31037bf2f3977d2071fb75.