andsens / homeshick

git dotfiles synchronizer written in bash
MIT License
2.11k stars 145 forks source link

Clone/pull on large repositories/submodules: Recursion and depth #196

Open gerardbosch opened 4 years ago

gerardbosch commented 4 years ago

Thanks for your project Anders. I've just started using Homeshick and find it really useful (still WIP: https://github.com/gerardbosch/dotfiles)

I see the homeshick clone and homeshick pull commands initializes recursively submodules.

Do you think it would make sense to perform the following with --depth=1 instead?

(I'm not very sure if other commands would be affected).

Rationale:

Sometimes submodules can come with a huge history, adding unnecessary MB.

I see that for example, Zsh package manager Antigen, clones packages (they call them bundles) by doing --depth=1. I've thought it may make sense for Homeshick as well.

For example, I added bash-it as a submodule to my dotfiles. In the Github's bash-it README, they instruct you to clone the project with --depth=1. The problem here is that now that this submodule sits in my dot files, it grows from 4,8MB to 44MB if I do homeshick pull (or if I do homeshick clone in another account/machine). This is just an example, but will happen with any other project with heavy-sized histories OR heavy recursive modules.

I could provide a PR if necessary.

--

I also have a side question (more anecdotically): Do you know if is there any way to configure Git to "ignore initialization" of certain submodules (like test libraries) on git submodule update? Following the same example, bash-it comes with some other submodules for testing the project:

bash-it/test_lib/bats-assert
bash-it/test_lib/bats-core
bash-it/test_lib/bats-file
bash-it/test_lib/bats-support

I mean, nested submodules in repos that are out of my control. I guess adding ignore = dirty and update = none in its .gitmodules would make it, but not very sure.

They are lightweight in this case (even though could be not), but I don't think I actually need these test dependencies (they are actually not initialized if I just do git clone .../bash-it.git). Maybe this would be already mitigated also using the --depth=1 in homeshick clone/pull.

Thanks!

andsens commented 4 years ago

Apologies for the late answer, and thank you for the kind words.

This is a brilliant idea! I struggle with the same issues sometimes because I use prezto.

There are some challenges though:

I'm not saying that these are insurmountable problems. But they would indicate that we'd need quite a lot of code and user interaction to handle this. If that is the case, it would be a no-go, since the strength of homeshick lies in the transparency of what is going on and the simplicity in how you set it up / configure it.

Do you know if is there any way to configure Git to "ignore initialization" of certain submodules

Hm, if it were in the root repo I'd just go with something that consumes the git submodule output, but when talking about a sub-submodule we don't even have that info. It'd be like preventing clone --recursive from initializing some specific submodules before you have any data. Unless you split the initialization up into stages I don't see how :-/

gerardbosch commented 4 years ago

Thanks for your reply. As per your comments I think this could be more complex than I initially guessed, so maybe it is more complex than what I suggested of changing the homeshick clone/pull commands to

git clone && git submodule update --init --recursive --depth=1
git pull && git submodule update --init --recursive --depth=1 

Just to understand better, which would be the downside of this switch? If I understand well, this would make a full clone of user dotfiles castle, but a shallow clone of castle's submodules.