Open bittrance opened 1 year ago
Thanks for reporting. The effort to provide a reproducer is appreciated, even though I'd hope for a public repo URL that can be downloaded with it.
Also, which version of gix
are you using? If it's not the latest, did you try to include gix
as git-dependency so the latest main
can be used? Further, you can try setting fetch.negotiationAlgorithm
to different values to see if that resolves the issue or alters the outcome. I'd recommend just using gix clone
(CLI) for testing. I'd expect that it runs in to the same issue.
When fed the URL for a a non-trivial repo, output will look like this. Note the suspicious 8192 on read pack progress. I've tested a few different repos and they all get 8192 "pack steps".
A BufReader is used internally and 8192 is the default buffer size.
Error: Fetch(WritePack(IndexWrite(PackEntryDecode(NotFound { object_id: Sha1(cdd16b2c111f7de5457e4c3182ed4d781a062a2a) }))))
This means that the remote sent a thin pack, which referenced a local object that is supposed to exist, for use as a base object. The only reason I imagine this could happen is if the negotiation step leaves the server incorrectly thinking that this object exists locally, but since gix
now negotiates just like git
does, I'd not expect that to happen. Older versions of gix
used a naive negotiation algorithm which might have had such side-effects, particularly with custom server implementations that might have bugs on their own. After all, even in older versions of gix
I have never heard of this happening.
Contrary to my understanding, it is possible to have public repos on Azure DevOps with an MSDN subscription. I created https://dev.azure.com/bittrance/public/_git/gitoxide
/ git@ssh.dev.azure.com:v3/bittrance/public/gitoxide
if you want to perform your own tests. The behaviour seems to be the same on HTTPS.
@Byron As you correctly noted, the example code was not on gitoxide main branch (it was release 0.51.0 if it matters). I updated my code to a git ref and re-ran my code on above repo. The read pack stage now seems to run to completion (Some("read pack") Info done 8459627 items in 6.57s (1287093 items/s)
) but it still fails with Fetch(WritePack(IndexWrite(PackEntryDecode(NotFound { object_id: Sha1(daa3b520fbbb247e8057fed0e7eda8320974af77) }))))
.
The hanging behaviour is a property of my code in that the ?
case in the scope()
will block until the watchdog thread hits the deadline before returning. In fact fetch_only()
fails immediately after having read the packs.
I'll be happy to dive deeper, but it may be some days before I have the time.
I added a trivial repository https://dev.azure.com/bittrance/public/_git/die-kleine that can be used to verify that very small repos can be cloned successfully.
That's perfect, thank you! I will take a closer look but believe to have everything I need for a fix.
First of all, gix clone https://dev.azure.com/bittrance/public/_git/gitoxide
does reproduce the issue - it stops fetching the pack after 8.5MB and then can't find the object daa…
during pack-resolution. git clone https://dev.azure.com/bittrance/public/_git/gitoxide
works perfectly though.
If you are keen to investigate before me, I have a couple of things to finish until the 25th (at most), you could use gix -c http.proxy <proxy> clone …
and compare what you see with git -c http.proxy <proxy> clone …
.
Please note that I couldn't reproduce it with the SSH url as it doesn't know me, but that won't be necessary as you validated it's the same behaviour. That's great as it means it's unrelated to the underlying transport.
After a short moment of investigation it is clear that this is the code that cancels the operation: the pack, which is supposed to be complete, contains a ref-delta object, which means in order to apply this delta, it needs the named base object. Since the object doesn't exist - after all, the local repo is being populated - the lookup fails and everything unwinds.
I think what's happening here is the server is old enough to encode delta-bases with full hash instead of just the offset to a prior object. Maybe this is done because they don't sort their objects so that the base actually appears before the object referencing it, or maybe there is another reason.
git
can do it because apparently it doesn't assume that base object references are used only to refer to locally existing objects, whereas all other delta-bases must be self-referential within a properly ordered pack.
Diving deeper, it turns out that the other end is a V1 server, which responds exactly the same for both gix
and git
:
001e# service=git-upload-pack
000000c114e0763cbf368bda476046a0fd28be230d67b1bd HEAD� multi_ack thin-pack side-band side-band-64k no-progress multi_ack_detailed no-done shallow allow-tip-sha1-in-want symref=HEAD:refs/heads/main
003d14e0763cbf368bda476046a0fd28be230d67b1bd refs/heads/main
0000
From there gix
requests a pack like this:
009bwant 14e0763cbf368bda476046a0fd28be230d67b1bd thin-pack side-band-64k shallow multi_ack_detailed allow-tip-sha1-in-want no-done agent=git/oxide-0.53.1
0032want 14e0763cbf368bda476046a0fd28be230d67b1bd
00000009done
and git
does so using:
0065want 14e0763cbf368bda476046a0fd28be230d67b1bd multi_ack_detailed no-done side-band-64k thin-pack
00000009done
Note that both implementations ask for Version 2 of the protocol, but only get V1.
gix
generally tends to enable all features, and in this case, thin-pack
is the one that causes ref-deltas to be allowed to point to known objects. The server should know that we have no known objects, but in any case, git
also enables thin-pack
so should receive the same kind of pack.
I may repeat the first sentence of the documentation of thin-pack
:
A thin pack is one with deltas which reference base objects not contained within the pack (but are known to exist at the receiving end).
If that's to be believed, the server definitely sends an invalid pack as the client never advertised any HAVE
line, it has no object.
With a quick-hack I removed the thin-pack
capability advertisement when asking for a pack, which should tell the server not to produce packs that refer to objects outside of the pack itself.
0091want 14e0763cbf368bda476046a0fd28be230d67b1bd side-band-64k shallow multi_ack_detailed allow-tip-sha1-in-want no-done agent=git/oxide-0.53.1
0032want 14e0763cbf368bda476046a0fd28be230d67b1bd
00000009done
(The request doesn't mention "thin-pack" anymore)
However, the server still answers with a thin pack, which is another reason for assuming the other side is some custom, none-standard implementation.
Next up was to see if git
sees a different pack than gix
for other reasons.
❯ /bin/ls -l ~/Desktop/Response_Body_09-21-2023-11-26-48.folder/\[*
-rw-r--r-- 1 byron staff 45633504 Sep 21 11:27 /Users/byron/Desktop/Response_Body_09-21-2023-11-26-48.folder/[20] Response - dev.azure.com_bittrance_public__git_gitoxide_git-upload-pack
-rw-r--r-- 1 byron staff 45633504 Sep 21 11:26 /Users/byron/Desktop/Response_Body_09-21-2023-11-26-48.folder/[70] Response - dev.azure.com_bittrance_public__git_gitoxide_git-upload-pack
gitoxide ( fix-1025) +2 -4 [$!]
❯ sha1sum ~/Desktop/Response_Body_09-21-2023-11-26-48.folder/\[*
536fe9a59ff7a834f1ece763a4709e4092b7a480 /Users/byron/Desktop/Response_Body_09-21-2023-11-26-48.folder/[20] Response - dev.azure.com_bittrance_public__git_gitoxide_git-upload-pack
d022148adf21e2d1ae00f70ac4818e2705fba97d /Users/byron/Desktop/Response_Body_09-21-2023-11-26-48.folder/[70] Response - dev.azure.com_bittrance_public__git_gitoxide_git-upload-pack
(one file received by gix
, the other by git
)
After removing the leader of the file which had a varying bit, the amount of milliseconds to ready the pack, I get this:
❯ sha1sum ~/Desktop/Response_Body_09-21-2023-11-26-48.folder/\[*
bbf11ac0f17cdcfe16a6009654396cd91fa96457 /Users/byron/Desktop/Response_Body_09-21-2023-11-26-48.folder/[20] Response - dev.azure.com_bittrance_public__git_gitoxide_git-upload-pack
bbf11ac0f17cdcfe16a6009654396cd91fa96457 /Users/byron/Desktop/Response_Body_09-21-2023-11-26-48.folder/[70] Response - dev.azure.com_bittrance_public__git_gitoxide_git-upload-pack
Indeed, it's the same pack is sent no matter what.
Here is what I find amazing: git
is able to resolve the pack nonetheless, but gix
isn't for good reasons. It's amazing that git
can do it as this would mean that while resolving a pack it keeps the index-so-far available for access by the resolver so it can actually figure out the offset to the base object in the pack (assuming it was mentioned already) from its ID.
For fun, I 'hacked' the portion of the code that causes the error when streaming the pack but it fails right after with…
Error: Ref delta objects are not supported as there is no way to look them up. Resolve them beforehand.
…just because it's utterly impossible for gix
pack resolution engine to deal with ID->offset resolution in the part of the program that is supposed to resolve pack entries to produce said ID->offset association which is stored in a pack index. Resolve them beforehand
is not possible as the hashes of these entries aren't known yet.
Reading further, I found the ofs-delta
option that indicates that apparently, thin-pack
is entirely unrelated to how delta-objects are encoded, but instead it's related of whether or not PACKV2 is supported. As a matter of fact, the streaming reader only supports V2 packs, and V1 never was a thing unless the implementation is very old maybe.
Unfortunately, ofs-delta
isn't anything the azure server seems to support, at least it's not advertised. gix
already sets it in V2 servers, for good reason, but in V1 it's not exactly possible.
I even tried sending ofs-delta
along with the V1 pack request in the off-chance that the server now avoids IDs in packs, but to no avail, it still sends the very same pack.
Looking into it further, I realized that I never really studied how git unpacks objects (probably I just forgot) but went right away for the implementation that made most sense to me, which unknowingly made it very incompatible to supporting the kind of packs that this server is sending.
Anyway, I think in git
it's here where the magic is happening: It stores the ref-delta, to later lookup the offset of the base object which has already been resolved - thus its ID is known.
gix
be 'fixed'?When looking at the reason git
can resolve such a pack (which really has no reason to exist as ref-deltas are inefficient compared to a delta-encoded offset) it turns out that git
explicitly supports such a case, which, against all odds, seems valid. In a way, gix
has been over-fitted to deal with 'modern' packs only.
It also, clearly, is a very old way of building packs, which affects my willingness to try hard to make this possible. Unfortunately, there is no simple fix either, as the "gix
pack resolution engine" really is just a pack traversal that leverages a data structure for perfect multi-threading without doing any unnecessary computation. This also leads to that data structure knowing only offsets, but never hashes - it only decodes objects and the hash is created by the index creation routine. And the reason a solution with this implementation is impossible is that the actual tree of deltas is built ahead of time, and each base is referred to only by its offset. What further complicates this is that the data structure itself is optimized for size, and adding to it will make it uncompetitive compared to the memory requirements of git
.
At this time, I don't think there can be a fix for this as it's architecturally impossible, and not incentivised as the current engine is incredible fast and efficient while dealing with 'normal' packs. To support packs that use ref-deltas to refer to objects within the pack, one would have to change the resolution engine to something that is similar to the implementation that git
uses. It's worth noting that it seemingly can also deal with ref-deltas that refer to base objects anywhere in the pack.
Implementing this might happen at some point if it turns out that other features, like promisor objects, can't be support in the way it's currently done, but it's not planned as of now.
Well, that was disappointing, but not entirely surprising. I had the "Test with azdo" task high on my todo because I know the azdo Git/SSH server is homegrown and quirky. I note that go-git also has problem supporting azdo: https://github.com/go-git/go-git/issues/64 (or the archived https://github.com/src-d/go-git/issues/1058) . go-git does not support mutli_ack so it gets a somewhat different issue.
I respect the decision to kick this can down the road. AzDO is relatively common in the (MS dominated) corporate world so gix
may have to confront it at some point, but it is unlikely to be a big thing with early adopters.
I'll leave it up to you to decide whether you want to keep this issue open for discoverability or close it.
Thank you for your detailed problem analysis. :heart:
I respect the decision to kick this can down the road. AzDO is relatively common in the (MS dominated) corporate world so
gix
may have to confront it at some point, but it is unlikely to be a big thing with early adopters.I'll leave it up to you to decide whether you want to keep this issue open for discoverability or close it.
I think there is value in having a possibly user-selectable alternative implementation for pack resolution, and think that it could work very similarly to how git
currently does it for maximum compatibility. Contributions for this are welcome, even though I don't expect this to actually be contributed as it's a difficult topic for sure.
Maybe one day Microsoft choses to update their server side and there is no demand anymore, or they sponsor this feature, let's wait and see.
Hi, I think I'm also hitting this same issue with this repository I discovered, except it's hosted on Github: https://github.com/rodrigocadiz47/DApp
WritePack(IndexWrite(PackEntryDecode(NotFound { object_id: Sha1(3825fe6b335743b7b85eb2ee017fb3eda0529f1e) })))
Error: A pack entry could not be extracted
Caused by:
The object 3825fe6b335743b7b85eb2ee017fb3eda0529f1e could not be decoded or wasn't found
Is it possible for this issue to affect some repositories on Github too or should I open a new issue for this?
Thanks for letting me know, and for finding just the right issue to put it in!
It's really good to know that this isn't exclusively an Azure DevOps problem. Maybe it has been a good problem a couple of years ago, and Azure never moved on from that. That theory seems to be somewhat justified by seeing that the most recent commit in DApp
was two years ago. I am pretty sure that GitHub has this pack cached, and maybe it was created with a version of Git that produced this kind of 'buggy' (as in larger than needed) pack.
To put it to the test, I forked the Repo, reproduced the issue in the new fork, and made minor changes to the Readme. The issue still reproduces perfectly, probably due to Git just doing a pack-to-pack copy so the objects stay exactly the same.
❯ gix clone https://github.com/Byron/DApp-Pack-Testing
15:49:21 read pack done 8.2KB in 0.03s (276.8KB/s)
Error: A pack entry could not be extracted
Caused by:
The object 3825fe6b335743b7b85eb2ee017fb3eda0529f1e could not be decoded or wasn't found
This issue has showed up for me when implementing a toy git server with gitoxide's infrastructure. When pushing to a completely empty repo, my local git
(git version 2.35.3
) seems to send a pack with ref deltas that reference objects in the same pack.
Thanks a lot for sharing, even though I definitely don't like what I hear. After all, this would mean that for some reason, recent Git also does that even though it really shouldn't make any sense to do - size-wise it's always better to use offset-deltas to refer to previous objects, so that ref-deltas are only used to refer to out-of-pack objects (which don't exist if the remote is empty).
Current behavior 😯
It seems that Azure Devops (at least over SSH) has some form of pagination when reading packs. Given the code below, fetching a trivial repo (one file, one commit) will work correctly produce the following output:
When fed the URL for a a non-trivial repo, output will look like this. Note the suspicious 8192 on read pack progress. I've tested a few different repos and they all get 8192 "pack steps".
At this point it hangs. Eventually, the timeout occurs and the following error is raised:
Tho I suppose this is just a consequence of us not having received the object in question.
I'll try to dig deeper tomorrow.
Expected behavior 🤔
If it works for small Azure Devops repos, surely it should work for large :) The same code has no problem fetching a GitHub repo of non-trivial size.
Steps to reproduce 🕹
Code to reproduce the problem: