GitoxideLabs / gitoxide

An idiomatic, lean, fast & safe pure Rust implementation of Git
Apache License 2.0
9.09k stars 314 forks source link

[integration] Shallow clones for `cargo` #449

Open Byron opened 2 years ago

Byron commented 2 years ago

This issue collects thoughts and facts about the state of shallow clones for git repositories when used by cargo.

Here is a list of steps to take in cargo to support step-wise integration of gitoxide.

Terminology

Let's be sure we are on the same page, so I repeat here this comment by @eh2406 to set a baseline.

Another source of miscommunication is that there are two interconnected potential changes.

of course one depends on the other.

Tracking issues

Cloning crates.io + crates (non-shallow)

It would be most straightforward to implement git::fetch(…) using gitoxide. This includes all transports and all credentials options that git2 supports for maximum usability.

Note that checkouts would still be performed by git2.

Requirements

All requirements are to be validated with the cargo-team, and a checkmark means its indeed a requirement.

Cloning crates.io + crates (shallow)

Add a parameter to support shallow fetches that maintain shallow-ness.

Issues

Don't forget about the general considerations of shallow clones for database-like repositories by ehuss in a comment, which might make this option unusable. It's something to validate first. If it truly is an issue, shallow can be turned off for crates.io but can be used for crates clones.

Assumptions

These should be validated to see if they may indeed be considered issues or risks one day in case they are proven true.

Questions

Requirements

Notes by @eh2406

Interesting reading

Checkout worktrees (without submodules)

This effectively is an implementation of git reset --hard as used in GitCheckout::reset(…).

Questions

Notes by @eh2406

Checkout submodules

Update submodules as in GitCheckout::update_submodules(…).

Out of scope

Reducing the local size of the .cargo directory seems very doable even without great effort, but we chose to tackle these separately.

### bare shallow clones vs non-shallow ones ``` ❯ git clone --bare https://github.com/rust-lang/crates.io-index index-full-history.git Cloning into bare repository 'index-full-history.git'... remote: Total 457133 (delta 151), reused 69 (delta 0), pack-reused 456913 Receiving objects: 100% (457133/457133), 209.38 MiB | 1.21 MiB/s, done. Resolving deltas: 100% (319566/319566), done. ~/.cargo/registry took 2m59s ❯ git clone --depth 1 --bare https://github.com/rust-lang/crates.io-index index-shallow-depth-1.git Cloning into bare repository 'index-shallow-depth-1.git'... remote: Total 108481 (delta 57698), reused 92572 (delta 47615), pack-reused 0 Receiving objects: 100% (108481/108481), 53.77 MiB | 2.05 MiB/s, done. Resolving deltas: 100% (57698/57698), done. ~/.cargo/registry took 34s ``` ### worktree checkout sizes (compressed, uncompressed) ``` .cargo/registry/index-shallow-depth-1.git ( master) ❯ l .rw-r--r-- 703Mi byron staff 1 Jul 11:40 archive.tar .rw-r--r-- 44Mi byron staff 1 Jul 11:40 archive.tar.gz ```
wezm commented 2 years ago

Just as another reference for projects moving away from shallow clones at GitHub's request, there is also Homebrew in 2020:

https://github.com/Homebrew/brew/blob/17a7e71d909de4d09bc2cb479b1ccf975648fbd2/Library/Homebrew/cmd/update.sh#L448-L454

https://github.com/Homebrew/brew/pull/8883

Byron commented 2 years ago

Thanks for posting!

What I understand from this is that:

This should mean that shallow clones for cargo are pretty much the way to go on CI and probably locally as well as there is no planned unshallow operation at all (even though users might chose to do that if they want to access the history of the crates index for research for instance, after all cargo doesn't care about the history).

Please let me know if I am missing something.

Eh2406 commented 2 years ago

Thank you for your work organizing this complicated topic. I am going to put 2 nits here, and the rest in the zulip thread. Just my comments, not speaking officially for the team.

How to 'unshallow' a crates index? Some might want it for research. In any case, there should be a known path for this, so probably there must be an option for this in the cargo config no matter what will be the default.

I don't think there needs to be an option for unshallowing. The clone that Cargo makes is an implementation detail, that entirely belongs to Cargo. If someone wants a copy of the data with different configuration, they can clone the index themselves. That being said, we do have to be careful about how different versions of Cargo interact. If one version of cargo does a shallow clone, and then an older version of cargo is used to do an index update, things have to at least work.

use a bare clones of the crates.io index and extract files content directly from git

I believe this is done already.

Byron commented 2 years ago

Thank you for your work organizing this complicated topic. I am going to put 2 nits here, and the rest in the zulip thread. Just my comments, not speaking officially for the team.

How to 'unshallow' a crates index? Some might want it for research. In any case, there should be a known path for this, so probably there must be an option for this in the cargo config no matter what will be the default.

I don't think there needs to be an option for unshallowing. The clone that Cargo makes is an implementation detail, that entirely belongs to Cargo. If someone wants a copy of the data with different configuration, they can clone the index themselves. That being said, we do have to be careful about how different versions of Cargo interact. If one version of cargo does a shallow clone, and then an older version of cargo is used to do an index update, things have to at least work.

Thank you, I took note of this and specifically mentioned the need for backwards compatibility and to validate it. Since older versions of cargo would use git2, or recent versions of cargo will use git2 if gitoxide isn't turned on, I wouldn't be surprised if fetches would fail - I am looking forward to trying it out.

use a bare clones of the crates.io index and extract files content directly from git

I believe this is done already.

A good catch! I updated that passage to reflect the status quo.

I also started off this issues with the terminology you provided over on zulip, thanks for taking the time to help people discuss this complex topic.