Open dscho opened 6 months ago
/submit
Submitted as pull.1689.git.1710183362.gitgitgadget@gmail.com
To fetch this version into FETCH_HEAD
:
git fetch https://github.com/gitgitgadget/git/ pr-1689/dscho/cat-file-vs-submodules-v1
To fetch this version to local tag pr-1689/dscho/cat-file-vs-submodules-v1
:
git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1689/dscho/cat-file-vs-submodules-v1
On the Git mailing list, Junio C Hamano wrote (reply to this):
"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:
> The cat-file --batch command is very valuable in server settings, but so far
> it is missing a bit of functionality that would come in handy there.
>
> For example, it is sometimes necessary to determine the object mode of a
> batch of tree objects' children.
OK.
It is somewhat unsatisfying that --batch/--batch-check lacks so
much. Even with %(objectmode) its nature of one-object-at-a-time
makes querying children of a large tree a chore, when you compare it
with something like "cat-file -p HEAD:" that allows you to grab the
needed information for all children with a single invocation.
This is orthogonal to what the patch wants to do, which is to enrich
the output side with more formatting, bit I wonder if we want to
consider enriching the input side? e.g. instead of feeding just a
single object name from the standard input of "cat-file
--batch/--batch-check", perhaps a syntax can say "Here I have the
object name for a tree-ish object, but please pretend that I gave
you all the objects contained within it", or something?
Thanks, will queue.
User Jeff King <peff@peff.net>
has been added to the cc: list.
On the Git mailing list, Jeff King wrote (reply to this):
On Mon, Mar 11, 2024 at 02:43:00PM -0700, Junio C Hamano wrote:
> It is somewhat unsatisfying that --batch/--batch-check lacks so
> much. Even with %(objectmode) its nature of one-object-at-a-time
> makes querying children of a large tree a chore, when you compare it
> with something like "cat-file -p HEAD:" that allows you to grab the
> needed information for all children with a single invocation.
>
> This is orthogonal to what the patch wants to do, which is to enrich
> the output side with more formatting, bit I wonder if we want to
> consider enriching the input side? e.g. instead of feeding just a
> single object name from the standard input of "cat-file
> --batch/--batch-check", perhaps a syntax can say "Here I have the
> object name for a tree-ish object, but please pretend that I gave
> you all the objects contained within it", or something?
That is an interesting direction. In practice I guess you might want to
expand trees (to show their contents) or perhaps commits (to traverse
history and/or look at their trees). And we already have tools to do
that.
So for example you can already do:
git ls-tree --format='%(objectname) %(objectmode)' HEAD
Or if you wanted to mix-and-match with other cat-file placeholders, you
can do:
git ls-tree --format='%(objectname) %(objectmode)' HEAD |
git cat-file --batch-check='%(objectname) %(deltabase) %(rest)'
That is a little less efficient (we look up the object twice), but once
you are working with hex object ids it is not too bad (cat-file is
heavily optimized here). Of course in the long run I think we should
move to a future where the formatting code is shared, and you can just
ask ls-tree for deltabase if you want to.
I think leaving this to specialized tools like ls-tree gives them a lot
of flexibility that a special input mode to cat-file might find awkward.
For example, recurse vs non-recursive tree listing. Or filtering with
pathspecs. And of course when you get into commits and traversal, there
are many rev-list options. :)
The strategy so far has been making sure cat-file can efficiently take
in the output of these other tools to further describe objects. But
moving towards a unified output formatting model would be even better, I
think. In the meantime, I think cat-file learning %(objectmode) makes
sense for single names (rather than listing trees), and fortunately it
uses the same (obvious) name that ls-tree does, so we won't have a
problem unifying them later.
The patch itself looked reasonable to me, modulo the comments you
already made.
-Peff
On the Git mailing list, Junio C Hamano wrote (reply to this):
Jeff King <peff@peff.net> writes:
> That is an interesting direction. In practice I guess you might want to
> expand trees (to show their contents) or perhaps commits (to traverse
> history and/or look at their trees). And we already have tools to do
> that.
>
> So for example you can already do:
>
> git ls-tree --format='%(objectname) %(objectmode)' HEAD
>
> Or if you wanted to mix-and-match with other cat-file placeholders, you
> can do:
>
> git ls-tree --format='%(objectname) %(objectmode)' HEAD |
> git cat-file --batch-check='%(objectname) %(deltabase) %(rest)'
>
> That is a little less efficient (we look up the object twice), but once
> you are working with hex object ids it is not too bad (cat-file is
> heavily optimized here). Of course in the long run I think we should
> move to a future where the formatting code is shared, and you can just
> ask ls-tree for deltabase if you want to.
I was imagining more about a use case "cat-file --batch" was
originally designed for---having a long-running single process
and ask any and all questions you have about various objects in the
object database by interacting with it. So "yes, ls-tree can
already give us that information", while it is true, shoots at a
different direction from what I had in mind.
> The strategy so far has been making sure cat-file can efficiently take
> in the output of these other tools to further describe objects. But
> moving towards a unified output formatting model would be even better, I
> think. In the meantime, I think cat-file learning %(objectmode) makes
> sense for single names (rather than listing trees), and fortunately it
> uses the same (obvious) name that ls-tree does, so we won't have a
> problem unifying them later.
Yes, enriching the output format side is an orthogonal issue from
the input side, and the %(objectmode) thing that gives a piece of
information that is additionally available on top of the various
pieces of information about the object itself does make sense.
> The patch itself looked reasonable to me, modulo the comments you
> already made.
>
> -Peff
On the Git mailing list, Jeff King wrote (reply to this):
On Tue, Mar 12, 2024 at 12:28:48PM -0700, Junio C Hamano wrote:
> > Or if you wanted to mix-and-match with other cat-file placeholders, you
> > can do:
> >
> > git ls-tree --format='%(objectname) %(objectmode)' HEAD |
> > git cat-file --batch-check='%(objectname) %(deltabase) %(rest)'
> >
> > That is a little less efficient (we look up the object twice), but once
> > you are working with hex object ids it is not too bad (cat-file is
> > heavily optimized here). Of course in the long run I think we should
> > move to a future where the formatting code is shared, and you can just
> > ask ls-tree for deltabase if you want to.
>
> I was imagining more about a use case "cat-file --batch" was
> originally designed for---having a long-running single process
> and ask any and all questions you have about various objects in the
> object database by interacting with it. So "yes, ls-tree can
> already give us that information", while it is true, shoots at a
> different direction from what I had in mind.
Ah, yeah, that is one thing that cat-file does that no other part of the
system will. I do wonder in the long term if it is easier to teach
cat-file everything that all of the other commands can do, or to teach
all of the other commands some way of handling multiple requests in a
single process. ;)
(All obviously orthogonal to this patch series).
-Peff
The
cat-file --batch
command is very valuable in server settings, but so far it is missing a bit of functionality that would come in handy there.For example, it is sometimes necessary to determine the object mode of a batch of tree objects' children.
This came up in $dayjob recently, and applies cleanly to v2.44.0.
cc: Jeff King peff@peff.net