Open JayFoxRox opened 4 years ago
Hmm, I hoped they'd keep the split repo. So far the options I see are:
Include the full LLVM repo as a submodule (which is 2.2GiB in size)
Including 2.2GiB is not an option to me.
I assume it would also be a huge issue for people trying to get into nxdk development, if they have to clone the repo for a very long time (and waste so much memory). I don't have so much space available and it would actually mean that I'd had to stop my involvement with nxdk.
It was only an option, if we decentralized nxdk, and made binary releases (so only people who want to work on libcxx would have to clone it).
Use git filter-branch to update our libcxx repo out of the LLVM repo
Yes, I also considered this. This is also really bad, because we will have different hashes for the revisions.
So we'll manually have keep track of upstream revisions (github won't offer auto-comparisons for example). Rebasing might become an issue, because we need stability for the underlying revisions.
There's also git subtree
I'm not sure what the difference to git filter-branch
is.
I guess another option is to only do very shallow clones. - but even at --depth=1
it's still 140MiB in .git
and 900MiB combined with the checked out master
.
Including 2.2GiB is not an option to me.
I really don't see this as a large issue. Take for example Visual Studio & required SDKs for .NET development: takes 5.1GB just for .NET Core & and a couple of targeting packs.
I don't have so much space available and it would actually mean that I'd had to stop my involvement with nxdk.
Sorry to hear this, really. Can I offer gifting you a 16GB USB drive? You can get them readily for essentially pennies.
Take for example Visual Studio & required SDKs for .NET development: takes 5.1GB just for .NET Core & and a couple of targeting packs.
I would heavily disagree. We don't offer an app like Visual Studio - why would our source be as big as that? Additionally that comparison is not really useful at all. But I would agree that the personal disk space of some developers should not be part of the argument. However, generally speaking, having an insanely large repo might indeed turn of humans, who are not lucky enough to own state of the art hardware. We should at least have that in mind.
I would vote for the git subtree
/ git filter-branch
solution. It looks a bit messy to setup and maintain but I think its our best bet? Personally, I would probably be very annoyed by a repo size >1GiB which would be the result of the suggested --depth=1
solution.
This sounds interesting https://github.blog/2020-01-17-bring-your-monorepo-down-to-size-with-sparse-checkout/; especially when considering https://stackoverflow.com/questions/6238590/set-git-submodule-to-shallow-clone-sparse-checkout. Also see https://stackoverflow.com/questions/600079/how-do-i-clone-a-subdirectory-only-of-a-git-repository/52269934#52269934
So it might be possible to have the full monorepo online, but we could checkout the individual submodule directory.
Hm, this would save some space for the checkout, but it appears that it still has to clone the full repo - this can be fixed with the sparse cloning options, but that comes with the usual drawbacks of requiring extra steps if you want to be able to actually work on the code in the submodule.
if you want to be able to actually work on the code in the submodule.
Yes, but at least you get to do some smaller changes. Just don't attempt to bring an IDE, do a full blame or similar (personally, I do most of that on the GitHub web-ui anyway).
I'll have to do some more testing, my favorite so far is this:
git clone https://github.com/llvm/llvm-project --sparse --filter=tree:0 libcxx
cd libcxx
git sparse-checkout init --cone
git sparse-checkout set libcxx
It is about ~200MiB when initialized, but contains the entire commit history for libcxx. Problems only arise when you do git log -p
and scroll to the end, because then it will download all the blobs and trees, so your repo suddenly explodes (without warning you). Annoyingly, this also happens when looking for the log of a specific file or folder... so you'd have to avoid that.
Note that https://github.blog/2020-12-21-get-up-to-speed-with-partial-clone-and-shallow-clone/ also strongly discourages this workflow; however, the next best option is to keep a repository of about 500MiB (by keeping the trees, but removing blobs):
git clone https://github.com/llvm/llvm-project --sparse --filter=blob:none libcxx
cd libcxx
git sparse-checkout init --cone
git sparse-checkout set libcxx
Another way to prevent exploding the repo size could be to also clone shallow-since by date or a fixed depth (a couple thousand commits should be good enough). I don't think there's a clone shallow-since by commit for some reason? However, I was unable to make this work properly, because the shallow clone by date caused fetching of all trees and blobs.. thereby negating the filter. Fetching by explicit depth seems to work, but it's hard to control.
For users, we can do this, which is ~50MiB, but obviously shallow:
git clone https://github.com/llvm/llvm-project --sparse --filter=tree:0 libcxx --depth=1
pushd libcxx
git sparse-checkout init --cone
git sparse-checkout set libcxx
In size, this is comparable to a git clone https://github.com/llvm-mirror/libcxx.git
which is ~55MiB (that is: the old repo, non-shallow, fully ready for development).
So we are still losing a lot of functionality in the migration, even with cutting-edge git features.
However, regardless of what you prefer, all of the above is inherently incompatible with submodules.
Currently, we use .gitmodules in nxdk, but it's not possible to force sparse checkout or filters for the submodule. We could only do a shallow clone from .gitmodules, but that's already large and would mean extra steps for setting up a tree for development. Overwriting the submodule update command with a custom script is only possible locally (in ".git/" or gitconfig), and it's not allowed in ".gitmodules".
I think it's still worth considering to migrate to the monorepo. Personally I'd like to slim down nxdk anyway. If we have a simple shell script to initialize / install the packages, that's probably fine. We could have different settings for setting up a user and development tree.
As a user, I'd just shallow clone and partial clone for using nxdk anyway - heck, I'd probably even delete the git repositories after building binary libs. I'd only clone those repositories if I need to do actual changes to them. Even then I'd probably do partial clone or shallow-clone to some degree, too (with a very high depth, but not since 2001 - libcxx was imported in 2010).
So the potential options I see are:
I think all of those options suck.
However, I think migrating to the mono-repo is the best option though:
Is there consensus on migrating to the mono repo? A large repo is a small issue in 2023 compared to the time cost of a more complicated solution.
https://github.com/llvm-mirror/libcxx is no longer being updated. Instead, https://github.com/llvm/llvm-project should be used.
Unfortunately, this new repository is not just libcxx, but all of LLVM, so the repository is probably very large now. I'm not sure how to address this, while also keeping it maintainable and without requiring users to download a bunch of stuff they won't need.