Closed fenollp closed 5 years ago
WRT _checkouts
:
Simply make a symlink or copy your dependency to
_checkouts
at the top level of your project.
No difference from my proposal then.
Yeah we only wanted to cache hex packages in the first place, since they're immutable. I don't believe there's any plan to cache git deps. The management and handling of these sounds trickier since not all repos will have all branches and refs for all projects, or that two projects using the same branch on the same repo can point at two distinct refs. The cache handling and invalidation there really sounds tricky and not fun. By comparison, hex packages are static, come with their own hashes, and can be re-validated against a single well-known index quite simply.
Plus git is all uncompressed by default. Git is just not on the roadmap for caching because we did not think it was a good candidate for that.
I should note that ref vs. tag/branch is only a problem on first fetches (or upgrades) since otherwise the lock file contains the ref itself and we can rely on that. So I guess you'd save some network access, but at the cost of possibly more storage space (we currently check out single branches when we can for example).
The other interesting question with using a cache is that we implicitly make git non-concurrent (can't run many builds at once in the same user account) since two parallel builds may try to alter the same cached repo to get the branch they need (unless git supports that?)
I was thinking that caching bare repos would solve some of the trickiness: You can checkout multiple refs/branch/tags from the same single bare repo into different folders at the same time (as pointed above that uses git init). The only somewhat related issue I foresee here is when updating the cache: no more than one process per cached repo should be spawned running the fetch command. Haven’t tried it yet, maybe git handles this gracefully.
Yes basically once a lockfile exists refs can be checked out trivially. Hex packages are similar here. A tarball can even be fetched from a git hash with git-archive. The immutability properties are equal to pkgs (they both are only unusable when the remote/cdn disappears). I think storing the whole bare repo is more interesting though: _checkouts
are easier, storage is maybe more optimized when multiple versions of the same repo are depended on.
Anyway I think my first point will help you see a solution. I will look into how to have git optimize storage further than what bare repos do.
So while bare repos take less space than non-bare repos (because non-bare repos are by definition the .git
+ the worktree) I found only one way to ensure keeping the .git
size down to a minimum in a portable manner: git-gc. I am surprised there doesn't seem to be anything more effective than that!
Covered through https://github.com/erlang/rebar3/pull/1844 -- I think it's not a bad idea to do it through git-specific flags like that rather than maintaining a stateful cache of git repos across versions.
Current behaviour
Git dependencies are not cached (somewhat cc #1281). Here
jesse
gets cloned to_build/.../jesse
but~/.cache/rebar3/
has no Git directory or any place where it would cache it.Note that git plugins will be cloned to
~/.cache/rebar3/plugins/{plugin}
but their.git/
will not be kept (more like they getgit archive
d). This sounds like an issue for #1301.Expected behaviour
src/rebar_git_resource.erl
should be caching git dependencies.Here's how I think a cache of git repositories for rebar3 should look like:
That is: put git deps under
~/.cache/rebar3/git/{host}/{user}/{repo}
as bare repos.Now, if rebar3 tries to install a dep identified by its commit hash and we have the bare repo cached we can be sure to either have or not have that commit. With anything else than a hash we have to make sure our cached repo is up to date. In most cases however (provided a lockfile exists) that shouldn't happen.
Notes:
I have not looked into
_checkouts
yet so cannot comment on that. With this, I hope to receive you criticisms & ideas! Thank you