XboxDev / nxdk-libcxx

Copy of libcxx git repository located at http://llvm.org/git/libcxx (adapted for original Xbox / nxdk toolchain)
http://libcxx.llvm.org
Other
4 stars 3 forks source link

Upstream changed #3

Open JayFoxRox opened 4 years ago

JayFoxRox commented 4 years ago

https://github.com/llvm-mirror/libcxx is no longer being updated. Instead, https://github.com/llvm/llvm-project should be used.

Unfortunately, this new repository is not just libcxx, but all of LLVM, so the repository is probably very large now. I'm not sure how to address this, while also keeping it maintainable and without requiring users to download a bunch of stuff they won't need.

thrimbor commented 4 years ago

Hmm, I hoped they'd keep the split repo. So far the options I see are:

JayFoxRox commented 4 years ago

Include the full LLVM repo as a submodule (which is 2.2GiB in size)

Including 2.2GiB is not an option to me.

I assume it would also be a huge issue for people trying to get into nxdk development, if they have to clone the repo for a very long time (and waste so much memory). I don't have so much space available and it would actually mean that I'd had to stop my involvement with nxdk.

It was only an option, if we decentralized nxdk, and made binary releases (so only people who want to work on libcxx would have to clone it).

Use git filter-branch to update our libcxx repo out of the LLVM repo

Yes, I also considered this. This is also really bad, because we will have different hashes for the revisions.

So we'll manually have keep track of upstream revisions (github won't offer auto-comparisons for example). Rebasing might become an issue, because we need stability for the underlying revisions.

There's also git subtree I'm not sure what the difference to git filter-branch is.


I guess another option is to only do very shallow clones. - but even at --depth=1 it's still 140MiB in .git and 900MiB combined with the checked out master.

GXTX commented 4 years ago

Including 2.2GiB is not an option to me.

I really don't see this as a large issue. Take for example Visual Studio & required SDKs for .NET development: takes 5.1GB just for .NET Core & and a couple of targeting packs.

I don't have so much space available and it would actually mean that I'd had to stop my involvement with nxdk.

Sorry to hear this, really. Can I offer gifting you a 16GB USB drive? You can get them readily for essentially pennies.

Teufelchen1 commented 4 years ago

Take for example Visual Studio & required SDKs for .NET development: takes 5.1GB just for .NET Core & and a couple of targeting packs.

I would heavily disagree. We don't offer an app like Visual Studio - why would our source be as big as that? Additionally that comparison is not really useful at all. But I would agree that the personal disk space of some developers should not be part of the argument. However, generally speaking, having an insanely large repo might indeed turn of humans, who are not lucky enough to own state of the art hardware. We should at least have that in mind.

I would vote for the git subtree / git filter-branch solution. It looks a bit messy to setup and maintain but I think its our best bet? Personally, I would probably be very annoyed by a repo size >1GiB which would be the result of the suggested --depth=1 solution.

JayFoxRox commented 3 years ago

This sounds interesting https://github.blog/2020-01-17-bring-your-monorepo-down-to-size-with-sparse-checkout/; especially when considering https://stackoverflow.com/questions/6238590/set-git-submodule-to-shallow-clone-sparse-checkout. Also see https://stackoverflow.com/questions/600079/how-do-i-clone-a-subdirectory-only-of-a-git-repository/52269934#52269934

So it might be possible to have the full monorepo online, but we could checkout the individual submodule directory.

thrimbor commented 3 years ago

Hm, this would save some space for the checkout, but it appears that it still has to clone the full repo - this can be fixed with the sparse cloning options, but that comes with the usual drawbacks of requiring extra steps if you want to be able to actually work on the code in the submodule.

JayFoxRox commented 3 years ago

if you want to be able to actually work on the code in the submodule.

Yes, but at least you get to do some smaller changes. Just don't attempt to bring an IDE, do a full blame or similar (personally, I do most of that on the GitHub web-ui anyway).


I'll have to do some more testing, my favorite so far is this:

git clone https://github.com/llvm/llvm-project --sparse --filter=tree:0 libcxx
cd libcxx
git sparse-checkout init --cone
git sparse-checkout set libcxx

It is about ~200MiB when initialized, but contains the entire commit history for libcxx. Problems only arise when you do git log -p and scroll to the end, because then it will download all the blobs and trees, so your repo suddenly explodes (without warning you). Annoyingly, this also happens when looking for the log of a specific file or folder... so you'd have to avoid that. Note that https://github.blog/2020-12-21-get-up-to-speed-with-partial-clone-and-shallow-clone/ also strongly discourages this workflow; however, the next best option is to keep a repository of about 500MiB (by keeping the trees, but removing blobs):

git clone https://github.com/llvm/llvm-project --sparse --filter=blob:none libcxx
cd libcxx
git sparse-checkout init --cone
git sparse-checkout set libcxx

Another way to prevent exploding the repo size could be to also clone shallow-since by date or a fixed depth (a couple thousand commits should be good enough). I don't think there's a clone shallow-since by commit for some reason? However, I was unable to make this work properly, because the shallow clone by date caused fetching of all trees and blobs.. thereby negating the filter. Fetching by explicit depth seems to work, but it's hard to control.

For users, we can do this, which is ~50MiB, but obviously shallow:

git clone https://github.com/llvm/llvm-project --sparse --filter=tree:0 libcxx --depth=1
pushd libcxx
git sparse-checkout init --cone
git sparse-checkout set libcxx

In size, this is comparable to a git clone https://github.com/llvm-mirror/libcxx.git which is ~55MiB (that is: the old repo, non-shallow, fully ready for development). So we are still losing a lot of functionality in the migration, even with cutting-edge git features.


However, regardless of what you prefer, all of the above is inherently incompatible with submodules.

Currently, we use .gitmodules in nxdk, but it's not possible to force sparse checkout or filters for the submodule. We could only do a shallow clone from .gitmodules, but that's already large and would mean extra steps for setting up a tree for development. Overwriting the submodule update command with a custom script is only possible locally (in ".git/" or gitconfig), and it's not allowed in ".gitmodules".

I think it's still worth considering to migrate to the monorepo. Personally I'd like to slim down nxdk anyway. If we have a simple shell script to initialize / install the packages, that's probably fine. We could have different settings for setting up a user and development tree.

As a user, I'd just shallow clone and partial clone for using nxdk anyway - heck, I'd probably even delete the git repositories after building binary libs. I'd only clone those repositories if I need to do actual changes to them. Even then I'd probably do partial clone or shallow-clone to some degree, too (with a very high depth, but not since 2001 - libcxx was imported in 2010).


So the potential options I see are:

  1. Migrate to monorepo
    1. Drop submodules from nxdk, add some scripts and take the risky route of experimental git partial clone / dangerous shallow clone).
    2. ..or buy larger hard-drives.
  2. Set up our a LLVM / libcxx mirror (which also means breaking the commit chain, so we can't send patches upstream + we can't easily pull from upstream + we have trouble tracking bugfixes across upstream and split-repo mirror history).
    1. Pressure LLVM into doing it.
    2. .. or set up our own; potentially find other communities to maintain it with us / potentially a github org dedicated to mirror monorepos as split repos.
  3. Stick to (soon) ancient libcxx versions and the old repository, waiting how the situation develops; potentially adding pressure on git / github to support our use-case.

I think all of those options suck.

However, I think migrating to the mono-repo is the best option though:

glebm commented 1 year ago

Is there consensus on migrating to the mono repo? A large repo is a small issue in 2023 compared to the time cost of a more complicated solution.