Open Byron opened 2 years ago
Just as another reference for projects moving away from shallow clones at GitHub's request, there is also Homebrew in 2020:
Thanks for posting!
What I understand from this is that:
This should mean that shallow clones for cargo
are pretty much the way to go on CI and probably locally as well as there is no planned unshallow operation at all (even though users might chose to do that if they want to access the history of the crates index for research for instance, after all cargo doesn't care about the history).
Please let me know if I am missing something.
Thank you for your work organizing this complicated topic. I am going to put 2 nits here, and the rest in the zulip thread. Just my comments, not speaking officially for the team.
How to 'unshallow' a crates index? Some might want it for research. In any case, there should be a known path for this, so probably there must be an option for this in the cargo config no matter what will be the default.
I don't think there needs to be an option for unshallowing. The clone that Cargo makes is an implementation detail, that entirely belongs to Cargo. If someone wants a copy of the data with different configuration, they can clone the index themselves. That being said, we do have to be careful about how different versions of Cargo interact. If one version of cargo does a shallow clone, and then an older version of cargo is used to do an index update, things have to at least work.
use a bare clones of the crates.io index and extract files content directly from git
I believe this is done already.
Thank you for your work organizing this complicated topic. I am going to put 2 nits here, and the rest in the zulip thread. Just my comments, not speaking officially for the team.
How to 'unshallow' a crates index? Some might want it for research. In any case, there should be a known path for this, so probably there must be an option for this in the cargo config no matter what will be the default.
I don't think there needs to be an option for unshallowing. The clone that Cargo makes is an implementation detail, that entirely belongs to Cargo. If someone wants a copy of the data with different configuration, they can clone the index themselves. That being said, we do have to be careful about how different versions of Cargo interact. If one version of cargo does a shallow clone, and then an older version of cargo is used to do an index update, things have to at least work.
Thank you, I took note of this and specifically mentioned the need for backwards compatibility and to validate it. Since older versions of cargo would use git2
, or recent versions of cargo will use git2
if gitoxide
isn't turned on, I wouldn't be surprised if fetches would fail - I am looking forward to trying it out.
use a bare clones of the crates.io index and extract files content directly from git
I believe this is done already.
A good catch! I updated that passage to reflect the status quo.
I also started off this issues with the terminology you provided over on zulip, thanks for taking the time to help people discuss this complex topic.
This issue collects thoughts and facts about the state of shallow clones for git repositories when used by cargo.
Here is a list of steps to take in
cargo
to support step-wise integration ofgitoxide
.Terminology
Let's be sure we are on the same page, so I repeat here this comment by @eh2406 to set a baseline.
git = "<url>"
dependency in a cargo.toml.Another source of miscommunication is that there are two interconnected potential changes.
libgit2 -> gitoxide
gitoxide
(Specifically "shallow clones")of course one depends on the other.
Tracking issues
Cloning crates.io + crates (non-shallow)
It would be most straightforward to implement git::fetch(…) using
gitoxide
. This includes all transports and all credentials options thatgit2
supports for maximum usability.Note that checkouts would still be performed by
git2
.Requirements
All requirements are to be validated with the cargo-team, and a checkmark means its indeed a requirement.
gitoxide
should be used. Use an unstable flag as suggested by Josh.Cloning crates.io + crates (shallow)
Add a parameter to support shallow fetches that maintain shallow-ness.
Issues
Don't forget about the general considerations of shallow clones for database-like repositories by ehuss in a comment, which might make this option unusable. It's something to validate first. If it truly is an issue,
shallow
can be turned off for crates.io but can be used for crates clones.Assumptions
These should be validated to see if they may indeed be considered issues or risks one day in case they are proven true.
Questions
Requirements
Notes by @eh2406
Interesting reading
Checkout worktrees (without submodules)
This effectively is an implementation of
git reset --hard
as used inGitCheckout::reset(…)
.Questions
git/checkouts
) and their source, and does a full clone from these to the sources (git/db
, bare repos). Worktrees should help here, saving quite a bit of space.db
clones or always create a new one? It's the question on how to update worktrees with submodules properly after changes where pulled. I have a feeling the current setup works around this.Notes by @eh2406
Checkout submodules
Update submodules as in
GitCheckout::update_submodules(…)
.Out of scope
Reducing the local size of the
.cargo
directory seems very doable even without great effort, but we chose to tackle these separately.file://…
clones.cargo
is doing that already