gitgitgadget / git

GitGitGadget's Git fork. Open Pull Requests here to submit them to the Git mailing list
https://gitgitgadget.github.io/
Other
209 stars 133 forks source link

cat-file: add `%(objectmode)` avoid verifying submodules' OIDs #1689

Open dscho opened 6 months ago

dscho commented 6 months ago

The cat-file --batch command is very valuable in server settings, but so far it is missing a bit of functionality that would come in handy there.

For example, it is sometimes necessary to determine the object mode of a batch of tree objects' children.

This came up in $dayjob recently, and applies cleanly to v2.44.0.

cc: Jeff King peff@peff.net

dscho commented 6 months ago

/submit

gitgitgadget[bot] commented 6 months ago

Submitted as pull.1689.git.1710183362.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1689/dscho/cat-file-vs-submodules-v1

To fetch this version to local tag pr-1689/dscho/cat-file-vs-submodules-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1689/dscho/cat-file-vs-submodules-v1
gitgitgadget[bot] commented 6 months ago

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> The cat-file --batch command is very valuable in server settings, but so far
> it is missing a bit of functionality that would come in handy there.
>
> For example, it is sometimes necessary to determine the object mode of a
> batch of tree objects' children.

OK.

It is somewhat unsatisfying that --batch/--batch-check lacks so
much.  Even with %(objectmode) its nature of one-object-at-a-time
makes querying children of a large tree a chore, when you compare it
with something like "cat-file -p HEAD:" that allows you to grab the
needed information for all children with a single invocation.

This is orthogonal to what the patch wants to do, which is to enrich
the output side with more formatting, bit I wonder if we want to
consider enriching the input side?  e.g. instead of feeding just a
single object name from the standard input of "cat-file
--batch/--batch-check", perhaps a syntax can say "Here I have the
object name for a tree-ish object, but please pretend that I gave
you all the objects contained within it", or something?

Thanks, will queue.
gitgitgadget[bot] commented 6 months ago

User Jeff King <peff@peff.net> has been added to the cc: list.

gitgitgadget[bot] commented 6 months ago

On the Git mailing list, Jeff King wrote (reply to this):

On Mon, Mar 11, 2024 at 02:43:00PM -0700, Junio C Hamano wrote:

> It is somewhat unsatisfying that --batch/--batch-check lacks so
> much.  Even with %(objectmode) its nature of one-object-at-a-time
> makes querying children of a large tree a chore, when you compare it
> with something like "cat-file -p HEAD:" that allows you to grab the
> needed information for all children with a single invocation.
> 
> This is orthogonal to what the patch wants to do, which is to enrich
> the output side with more formatting, bit I wonder if we want to
> consider enriching the input side?  e.g. instead of feeding just a
> single object name from the standard input of "cat-file
> --batch/--batch-check", perhaps a syntax can say "Here I have the
> object name for a tree-ish object, but please pretend that I gave
> you all the objects contained within it", or something?

That is an interesting direction. In practice I guess you might want to
expand trees (to show their contents) or perhaps commits (to traverse
history and/or look at their trees). And we already have tools to do
that.

So for example you can already do:

  git ls-tree --format='%(objectname) %(objectmode)' HEAD

Or if you wanted to mix-and-match with other cat-file placeholders, you
can do:

  git ls-tree --format='%(objectname) %(objectmode)' HEAD |
  git cat-file --batch-check='%(objectname) %(deltabase) %(rest)'

That is a little less efficient (we look up the object twice), but once
you are working with hex object ids it is not too bad (cat-file is
heavily optimized here). Of course in the long run I think we should
move to a future where the formatting code is shared, and you can just
ask ls-tree for deltabase if you want to.

I think leaving this to specialized tools like ls-tree gives them a lot
of flexibility that a special input mode to cat-file might find awkward.
For example, recurse vs non-recursive tree listing. Or filtering with
pathspecs. And of course when you get into commits and traversal, there
are many rev-list options. :)

The strategy so far has been making sure cat-file can efficiently take
in the output of these other tools to further describe objects. But
moving towards a unified output formatting model would be even better, I
think. In the meantime, I think cat-file learning %(objectmode) makes
sense for single names (rather than listing trees), and fortunately it
uses the same (obvious) name that ls-tree does, so we won't have a
problem unifying them later.

The patch itself looked reasonable to me, modulo the comments you
already made.

-Peff
gitgitgadget[bot] commented 6 months ago

On the Git mailing list, Junio C Hamano wrote (reply to this):

Jeff King <peff@peff.net> writes:

> That is an interesting direction. In practice I guess you might want to
> expand trees (to show their contents) or perhaps commits (to traverse
> history and/or look at their trees). And we already have tools to do
> that.
>
> So for example you can already do:
>
>   git ls-tree --format='%(objectname) %(objectmode)' HEAD
>
> Or if you wanted to mix-and-match with other cat-file placeholders, you
> can do:
>
>   git ls-tree --format='%(objectname) %(objectmode)' HEAD |
>   git cat-file --batch-check='%(objectname) %(deltabase) %(rest)'
>
> That is a little less efficient (we look up the object twice), but once
> you are working with hex object ids it is not too bad (cat-file is
> heavily optimized here). Of course in the long run I think we should
> move to a future where the formatting code is shared, and you can just
> ask ls-tree for deltabase if you want to.

I was imagining more about a use case "cat-file --batch" was
originally designed for---having a long-running single process
and ask any and all questions you have about various objects in the
object database by interacting with it.  So "yes, ls-tree can
already give us that information", while it is true, shoots at a
different direction from what I had in mind.

> The strategy so far has been making sure cat-file can efficiently take
> in the output of these other tools to further describe objects. But
> moving towards a unified output formatting model would be even better, I
> think. In the meantime, I think cat-file learning %(objectmode) makes
> sense for single names (rather than listing trees), and fortunately it
> uses the same (obvious) name that ls-tree does, so we won't have a
> problem unifying them later.

Yes, enriching the output format side is an orthogonal issue from
the input side, and the %(objectmode) thing that gives a piece of
information that is additionally available on top of the various
pieces of information about the object itself does make sense.

> The patch itself looked reasonable to me, modulo the comments you
> already made.
>
> -Peff
gitgitgadget[bot] commented 6 months ago

On the Git mailing list, Jeff King wrote (reply to this):

On Tue, Mar 12, 2024 at 12:28:48PM -0700, Junio C Hamano wrote:

> > Or if you wanted to mix-and-match with other cat-file placeholders, you
> > can do:
> >
> >   git ls-tree --format='%(objectname) %(objectmode)' HEAD |
> >   git cat-file --batch-check='%(objectname) %(deltabase) %(rest)'
> >
> > That is a little less efficient (we look up the object twice), but once
> > you are working with hex object ids it is not too bad (cat-file is
> > heavily optimized here). Of course in the long run I think we should
> > move to a future where the formatting code is shared, and you can just
> > ask ls-tree for deltabase if you want to.
> 
> I was imagining more about a use case "cat-file --batch" was
> originally designed for---having a long-running single process
> and ask any and all questions you have about various objects in the
> object database by interacting with it.  So "yes, ls-tree can
> already give us that information", while it is true, shoots at a
> different direction from what I had in mind.

Ah, yeah, that is one thing that cat-file does that no other part of the
system will. I do wonder in the long term if it is easier to teach
cat-file everything that all of the other commands can do, or to teach
all of the other commands some way of handling multiple requests in a
single process. ;)

(All obviously orthogonal to this patch series).

-Peff