gitgitgadget / git

GitGitGadget's Git fork. Open Pull Requests here to submit them to the Git mailing list
https://gitgitgadget.github.io/
Other
208 stars 135 forks source link

advice: warn when sparse index expands #1756

Closed derrickstolee closed 1 month ago

derrickstolee commented 2 months ago

This is motivated by some cases I noticed while supporting users who were in this bad state and hitting the performance issue noticed in [1]. These users had some of the example situations where 'git status' did not provide helpful output. This idea has always been in the back of my mind since the sparse index was created, but it didn't make sense initially when only a few builtins could operate without immediately expanding a sparse index to a full one.

[1] https://lore.kernel.org/git/pull.1754.v3.git.1719578605.gitgitgadget@gmail.com/

Updates in V2

Thanks for the discussion on this patch!

Thanks, Stolee

cc: gitster@pobox.com cc: newren@gmail.com cc: vdye@github.com cc: rsbecker@nexbridge.com cc: Rubén Justo rjusto@gmail.com

derrickstolee commented 2 months ago

/submit

gitgitgadget[bot] commented 2 months ago

Submitted as pull.1756.git.1720019679517.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1756/derrickstolee/advice-sparse-index-v1

To fetch this version to local tag pr-1756/derrickstolee/advice-sparse-index-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1756/derrickstolee/advice-sparse-index-v1
gitgitgadget[bot] commented 2 months ago

This patch series was integrated into seen via https://github.com/git/git/commit/7fd4d6c3acb44bb0491003b843036bd0489c08bc.

gitgitgadget[bot] commented 2 months ago

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <stolee@gmail.com>
>
> Typically, forcing a sparse index to expand to a full index means that
> Git could not determine the status of a file outside of the
> sparse-checkout and needed to expand sparse trees into the full list of
> sparse blobs. This operation can be very slow when the sparse-checkout
> is much smaller than the full tree at HEAD.
>
> When users are in this state, it is common that 'git status' will report
> the problem. Usually there is a modified or untracked file outside of
> the sparse-checkout mentioned by the 'git status' output. There are a
> number of reasons why this is insufficient:

Nicely written to explain why giving an advice message is a good
idea to cover this situation.

Making it possible to squelch comes with no cost (once the code to
do so is written), so I do not have a huge problem with the use of
advise_if_enabled(), but I offhand do not know if the users would
ever want to squelch it.  Is this something that users would choose
to say "yes, I know what I am doing is making my sparse working tree
unusuably slow and I've heard how to whip my sparse working tree
into a better shape already---please do not tell it to me ever
again; because I need to leave these crufts outside the sparse cone
anyway, I am willing to accept the unusually slow response,
overhead, and wasted cycles and power" to?

Other than that, nicely done.
gitgitgadget[bot] commented 2 months ago

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 7/3/24 2:16 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> >> From: Derrick Stolee <stolee@gmail.com>
>>
>> Typically, forcing a sparse index to expand to a full index means that
>> Git could not determine the status of a file outside of the
>> sparse-checkout and needed to expand sparse trees into the full list of
>> sparse blobs. This operation can be very slow when the sparse-checkout
>> is much smaller than the full tree at HEAD.
>>
>> When users are in this state, it is common that 'git status' will report
>> the problem. Usually there is a modified or untracked file outside of
>> the sparse-checkout mentioned by the 'git status' output. There are a
>> number of reasons why this is insufficient:
> > Nicely written to explain why giving an advice message is a good
> idea to cover this situation.
> > Making it possible to squelch comes with no cost (once the code to
> do so is written), so I do not have a huge problem with the use of
> advise_if_enabled(), but I offhand do not know if the users would
> ever want to squelch it.  Is this something that users would choose
> to say "yes, I know what I am doing is making my sparse working tree
> unusuably slow and I've heard how to whip my sparse working tree
> into a better shape already---please do not tell it to me ever
> again; because I need to leave these crufts outside the sparse cone
> anyway, I am willing to accept the unusually slow response,
> overhead, and wasted cycles and power" to?

I currently can't imagine a case where a user would want to disable
this advice, but I defaulted to allowing it. I suppose it is more
difficult to remove that option later, so I should have defaulted
to not having it removable via config.

I can send a v2 without the config option present. (I'll wait a
day or two to see if others have strong opinions.)

Thanks,
-Stolee
gitgitgadget[bot] commented 2 months ago

On the Git mailing list, wrote (reply to this):

On Wednesday, July 3, 2024 3:18 PM, Derrick Stolee wrote:
>On 7/3/24 2:16 PM, Junio C Hamano wrote:
>> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>>
>>> From: Derrick Stolee <stolee@gmail.com>
>>>
>>> Typically, forcing a sparse index to expand to a full index means
>>> that Git could not determine the status of a file outside of the
>>> sparse-checkout and needed to expand sparse trees into the full list
>>> of sparse blobs. This operation can be very slow when the
>>> sparse-checkout is much smaller than the full tree at HEAD.
>>>
>>> When users are in this state, it is common that 'git status' will
>>> report the problem. Usually there is a modified or untracked file
>>> outside of the sparse-checkout mentioned by the 'git status' output.
>>> There are a number of reasons why this is insufficient:
>>
>> Nicely written to explain why giving an advice message is a good idea
>> to cover this situation.
>>
>> Making it possible to squelch comes with no cost (once the code to do
>> so is written), so I do not have a huge problem with the use of
>> advise_if_enabled(), but I offhand do not know if the users would ever
>> want to squelch it.  Is this something that users would choose to say
>> "yes, I know what I am doing is making my sparse working tree
>> unusuably slow and I've heard how to whip my sparse working tree into
>> a better shape already---please do not tell it to me ever again;
>> because I need to leave these crufts outside the sparse cone anyway, I
>> am willing to accept the unusually slow response, overhead, and wasted
>> cycles and power" to?
>
>I currently can't imagine a case where a user would want to disable this advice, but I
>defaulted to allowing it. I suppose it is more difficult to remove that option later, so I
>should have defaulted to not having it removable via config.
>
>I can send a v2 without the config option present. (I'll wait a day or two to see if
>others have strong opinions.)

One possible use case is during a CI test of user code where they need to do something that causes the advice, but don't want to see the message in their logs.
--Randall
gitgitgadget[bot] commented 2 months ago

User <rsbecker@nexbridge.com> has been added to the cc: list.

gitgitgadget[bot] commented 2 months ago

On the Git mailing list, Junio C Hamano wrote (reply to this):

Derrick Stolee <stolee@gmail.com> writes:

> I can send a v2 without the config option present. (I'll wait a
> day or two to see if others have strong opinions.)

FWIW, that wasn't the direction I meant to lead the topic into.  It
is perfectly acceptable if some are much more often used than others
among various advice.* configuration variables for squelching these
messages.

I was merely making an observation that this is likely to be a less
used ones.

Thanks.
gitgitgadget[bot] commented 2 months ago

On the Git mailing list, Rubén Justo wrote (reply to this):

On Wed, Jul 03, 2024 at 03:14:39PM +0000, Derrick Stolee via GitGitGadget wrote:

>   [ADVICE_SET_UPSTREAM_FAILURE]           = { "setUpstreamFailure" },

> + [ADVICE_SPARSE_INDEX_EXPANDED]          = { "sparseIndexExpanded" },
>   [ADVICE_SKIPPED_CHERRY_PICKS]           = { "skippedCherryPicks" },

Maybe this change will disappear in a possible v2, but if it remains
you'll probably want to swap these two lines. 

>   ADVICE_SET_UPSTREAM_FAILURE,

> + ADVICE_SPARSE_INDEX_EXPANDED,
>   ADVICE_SKIPPED_CHERRY_PICKS,

Ditto.
gitgitgadget[bot] commented 2 months ago

User Rubén Justo <rjusto@gmail.com> has been added to the cc: list.

gitgitgadget[bot] commented 2 months ago

On the Git mailing list, Elijah Newren wrote (reply to this):

Hi,

On Wed, Jul 3, 2024 at 8:14 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <stolee@gmail.com>
>
> Typically, forcing a sparse index to expand to a full index means that
> Git could not determine the status of a file outside of the
> sparse-checkout and needed to expand sparse trees into the full list of
> sparse blobs. This operation can be very slow when the sparse-checkout
> is much smaller than the full tree at HEAD.

Yep, I'm with you here.

> When users are in this state, it is common that 'git status' will report
> the problem.

I struggled to understand this sentence in combination with your later
statements, though that may only be because I had some difficulty with
later parts of the commit message.  Perhaps addressing the later parts
will make this sentence fine as-is, but it's possible this sentence
could do with a bit more detail.

> Usually there is a modified or untracked file outside of
> the sparse-checkout mentioned by the 'git status' output. There are a
> number of reasons why this is insufficient:

Fair enough; let's focus on why the output of git status is insufficient...

>  1. Users may not have a full understanding of which files are inside or
>     outside of their sparse-checkout. This is more common in monorepos
>     that manage the sparse-checkout using custom tools that map build
>     dependencies into sparse-checkout definitions.

Having sparse-checkout patterns managed by custom tools is a really
good point, but doesn't this statement of yours about needing to know
particular files or directories suggest that...

> diff --git a/sparse-index.c b/sparse-index.c
> index e48e40cae71..1e517f696dd 100644
> --- a/sparse-index.c
> +++ b/sparse-index.c
> @@ -12,6 +12,21 @@
>  #include "config.h"
>  #include "dir.h"
>  #include "fsmonitor-ll.h"
> +#include "advice.h"
> +
> +/**
> + * This global is used by expand_index() to determine if we should give the
> + * advice for advice.sparseIndexExpanded when expanding a sparse index to a full
> + * one. However, this is sometimes done on purpose, such as in the sparse-checkout
> + * builtin, even when index.sparse=false. This may be disabled in
> + * convert_to_sparse().
> + */
> +static int give_advice_on_expansion = 1;
> +#define ADVICE_MSG \
> +       "The sparse index is expanding to a full index, a slow operation.\n" \
> +       "This likely means that you have files in your working directory\n"  \
> +       "that are outside of your sparse-checkout patterns. Remove them\n"   \
> +       "to recover performance expectations, such as with 'git clean'."

...this is an insufficient solution?

I was a bit surprised you'd list your first reason for git status
being insufficient, that users need to know which files/directories
are the problem and then provide a solution that doesn't attempt to
identify any files or directories.

>  2. In some cases, an empty directory could exist outside the
>     sparse-checkout and these empty directories are not reported by 'git
>     status' and friends.

This is a really good point too...but given this point, shouldn't your
added advice message also mention "directories" instead of just
mentioning "files" so that users are aware they need to look for those
too?

>  3. If the user has '.gitignore' or 'exclude' files, then 'git status'
>     will squelch the warnings and not demonstrate any problems.

Your solution does help the user to know that there is a problem (even
if they don't know which files -- or directories -- are the problem),
so this patch is making things better.

Two other small comments...

> diff --git a/advice.c b/advice.c
> index 558a46fc0b3..7845e427c89 100644
> --- a/advice.c
> +++ b/advice.c
> @@ -77,6 +77,7 @@ static struct {
>         [ADVICE_RM_HINTS]                               = { "rmHints" },
>         [ADVICE_SEQUENCER_IN_USE]                       = { "sequencerInUse" },
>         [ADVICE_SET_UPSTREAM_FAILURE]                   = { "setUpstreamFailure" },
> +       [ADVICE_SPARSE_INDEX_EXPANDED]                  = { "sparseIndexExpanded" },
>         [ADVICE_SKIPPED_CHERRY_PICKS]                   = { "skippedCherryPicks" },

The rest of the list is in alphabetical order, so the new entry
probably should be too.

>         [ADVICE_STATUS_AHEAD_BEHIND_WARNING]            = { "statusAheadBehindWarning" },
>         [ADVICE_STATUS_HINTS]                           = { "statusHints" },
> diff --git a/advice.h b/advice.h
> index 5105d90129d..572272fa0da 100644
> --- a/advice.h
> +++ b/advice.h
> @@ -44,6 +44,7 @@ enum advice_type {
>         ADVICE_RM_HINTS,
>         ADVICE_SEQUENCER_IN_USE,
>         ADVICE_SET_UPSTREAM_FAILURE,
> +       ADVICE_SPARSE_INDEX_EXPANDED,
>         ADVICE_SKIPPED_CHERRY_PICKS,

Again, the rest of the entries are in alphabetical order.

Anyway, I've nit-picked on the little things I saw, but overall this
is a good patch that moves things in a good direction; with a few
small touch-ups it'll probably be good to go.
gitgitgadget[bot] commented 2 months ago

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 7/5/24 4:29 PM, Elijah Newren wrote:
> On Wed, Jul 3, 2024 at 8:14 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: Derrick Stolee <stolee@gmail.com>
>>
>> Typically, forcing a sparse index to expand to a full index means that
>> Git could not determine the status of a file outside of the
>> sparse-checkout and needed to expand sparse trees into the full list of
>> sparse blobs. This operation can be very slow when the sparse-checkout
>> is much smaller than the full tree at HEAD.
> > Yep, I'm with you here.
> >> When users are in this state, it is common that 'git status' will report
>> the problem.
> > I struggled to understand this sentence in combination with your later
> statements, though that may only be because I had some difficulty with
> later parts of the commit message.  Perhaps addressing the later parts
> will make this sentence fine as-is, but it's possible this sentence
> could do with a bit more detail.

I think also the issue with this sentence is that its thought isn't
complete until also reading the next one. These can be combined to
get to the point faster.

>> Usually there is a modified or untracked file outside of
>> the sparse-checkout mentioned by the 'git status' output. There are a
>> number of reasons why this is insufficient:
> > Fair enough; let's focus on why the output of git status is insufficient...
> >>   1. Users may not have a full understanding of which files are inside or
>>      outside of their sparse-checkout. This is more common in monorepos
>>      that manage the sparse-checkout using custom tools that map build
>>      dependencies into sparse-checkout definitions.
> > Having sparse-checkout patterns managed by custom tools is a really
> good point, but doesn't this statement of yours about needing to know
> particular files or directories suggest that...
>
>> +static int give_advice_on_expansion = 1;
>> +#define ADVICE_MSG \
>> +       "The sparse index is expanding to a full index, a slow operation.\n" \
>> +       "This likely means that you have files in your working directory\n"  \
>> +       "that are outside of your sparse-checkout patterns. Remove them\n"   \
>> +       "to recover performance expectations, such as with 'git clean'."
> > ...this is an insufficient solution?
> > I was a bit surprised you'd list your first reason for git status
> being insufficient, that users need to know which files/directories
> are the problem and then provide a solution that doesn't attempt to
> identify any files or directories.
> >>   2. In some cases, an empty directory could exist outside the
>>      sparse-checkout and these empty directories are not reported by 'git
>>      status' and friends.
> > This is a really good point too...but given this point, shouldn't your
> added advice message also mention "directories" instead of just
> mentioning "files" so that users are aware they need to look for those
> too?
> >>   3. If the user has '.gitignore' or 'exclude' files, then 'git status'
>>      will squelch the warnings and not demonstrate any problems.
> > Your solution does help the user to know that there is a problem (even
> if they don't know which files -- or directories -- are the problem),
> so this patch is making things better.

Thanks for pushing for a more helpful message. I'm currently thinking
that the core issue is that the working directory has contents (files
or directories) that disagree with the sparse-checkout definition. I
will update the language in v2 to point the user in the direction of
comparing the two.

Thanks,
-Stolee
derrickstolee commented 2 months ago

/submit

gitgitgadget[bot] commented 2 months ago

Submitted as pull.1756.v2.git.1720448038745.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1756/derrickstolee/advice-sparse-index-v2

To fetch this version to local tag pr-1756/derrickstolee/advice-sparse-index-v2:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1756/derrickstolee/advice-sparse-index-v2
gitgitgadget[bot] commented 2 months ago

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

>     ... This idea has always been in the back of my
>     mind since the sparse index was created, but it didn't make sense
>     initially when only a few builtins could operate without immediately
>     expanding a sparse index to a full one.

Yeah, if this triggered for a command whose operation does require
expanding, then that would be annoying to the users.

The phrasing that suggests "git clean" used in this version does
look better.  I wonder if "git clean" would also have to expand and
then trigger this warning, though ;-)

Thanks.
gitgitgadget[bot] commented 2 months ago

This patch series was integrated into seen via https://github.com/git/git/commit/625f404e26e2367aca14c406fc71b661371e2da8.

gitgitgadget[bot] commented 2 months ago

This branch is now known as ds/advice-sparse-index-expansion.

gitgitgadget[bot] commented 2 months ago

This patch series was integrated into seen via https://github.com/git/git/commit/dad9204260cb505b6dc5e46039b0715f5fd57364.

gitgitgadget[bot] commented 2 months ago

This patch series was integrated into next via https://github.com/git/git/commit/c821020b13448d684f7667fb83c4ccf96e754cfe.

gitgitgadget[bot] commented 2 months ago

This patch series was integrated into seen via https://github.com/git/git/commit/4db00fe596f8f824e88b37051621eab43f386464.

gitgitgadget[bot] commented 1 month ago

This patch series was integrated into seen via https://github.com/git/git/commit/35f8b70c1ec0115f86fe7c966d94c490611f04d3.

gitgitgadget[bot] commented 1 month ago

There was a status update in the "Cooking" section about the branch ds/advice-sparse-index-expansion on the Git mailing list:

A new warning message is issued when a command has to expand a
sparse index to handle working tree cruft that are outside of the
sparse checkout.

Will merge to 'master'.
source: <pull.1756.v2.git.1720448038745.gitgitgadget@gmail.com>
gitgitgadget[bot] commented 1 month ago

This patch series was integrated into seen via https://github.com/git/git/commit/30fe70a5bc8045a6746bea6f7bce74466c094c6f.

gitgitgadget[bot] commented 1 month ago

There was a status update in the "Cooking" section about the branch ds/advice-sparse-index-expansion on the Git mailing list:

A new warning message is issued when a command has to expand a
sparse index to handle working tree cruft that are outside of the
sparse checkout.

Will merge to 'master'.
source: <pull.1756.v2.git.1720448038745.gitgitgadget@gmail.com>
gitgitgadget[bot] commented 1 month ago

This patch series was integrated into seen via https://github.com/git/git/commit/1f4c27dcbffea89891826ed76443be7d61af3fec.

gitgitgadget[bot] commented 1 month ago

This patch series was integrated into seen via https://github.com/git/git/commit/5d71940ddab3196e071dab466d456fd0297056d9.

gitgitgadget[bot] commented 1 month ago

This patch series was integrated into master via https://github.com/git/git/commit/5d71940ddab3196e071dab466d456fd0297056d9.

gitgitgadget[bot] commented 1 month ago

This patch series was integrated into next via https://github.com/git/git/commit/5d71940ddab3196e071dab466d456fd0297056d9.

gitgitgadget[bot] commented 1 month ago

Closed via 5d71940ddab3196e071dab466d456fd0297056d9.