Closed felipefoz closed 3 years ago
Another strategy we could consider here: I believe when a repo is cloned, the local HEAD is copied from the remote HEAD. So if a repo has both master
and main
branches, we could figure out which one to use by just looking at HEAD (git branch --show-current
) after the clone. I fact, we might not need to special-case master
and main
at all with that approach.
However, one issue I see there, is that I think the state of our HEAD could get stale. If we cloned the repo last week, and this week the default branch was changed, would we ever find out without clearing our cache? It looks like git ls-remote --symref
might be a workaround for that, to let us directly inspect the HEAD of the remote, but I'm not 100% clear on how that works yet.
Btw, it looks like this position of the --symref
flag is important. So this works (actually it works even if you're not inside a repo, which is nice):
git ls-remote --symref https://github.com/buildinspace/peru HEAD
But moving the --symref
flag any farther to the right means we don't get the symref output. That's kind of lame.
Another strategy we could consider here: I believe when a repo is cloned, the local HEAD is copied from the remote HEAD. So if a repo has both
master
andmain
branches, we could figure out which one to use by just looking at HEAD (git branch --show-current
) after the clone. I fact, we might not need to special-casemaster
andmain
at all with that approach.However, one issue I see there, is that I think the state of our HEAD could get stale. If we cloned the repo last week, and this week the default branch was changed, would we ever find out without clearing our cache? It looks like
git ls-remote --symref
might be a workaround for that, to let us directly inspect the HEAD of the remote, but I'm not 100% clear on how that works yet.Btw, it looks like this position of the
--symref
flag is important. So this works (actually it works even if you're not inside a repo, which is nice):git ls-remote --symref https://github.com/buildinspace/peru HEAD
But moving the
--symref
flag any farther to the right means we don't get the symref output. That's kind of lame.
I thought of something like this, but that would require changing some strategy in testing, because all the testing considers a local git repository creation (I think), correct me if I am wrong. For this reason, I thought this approach would solve 99% of the problems.
Ah I see what's happening with test_git_plugin_multiple_fetches
. It's pretty non-obvious, sorry about that: That test is counting the number of clones and fetches that get executed, by looking for print statements in the plugin output. So because the new code is doing a clone without printing, the count is different from what the test expects. Now that I look at the code, I honestly can't remember why the clone_and_maybe_print
needs to exist, and I think the easiest thing to do is to just get rid of it and make clone_if_needed
do the printing instead. Like this:
diff --git a/peru/resources/plugins/git/git_plugin.py b/peru/resources/plugins/git/git_plugin.py
index 93f0aa4..dab533a 100755
--- a/peru/resources/plugins/git/git_plugin.py
+++ b/peru/resources/plugins/git/git_plugin.py
@@ -40,6 +40,8 @@ def has_clone(url):
def clone_if_needed(url):
repo_path = repo_cache_path(url)
if not has_clone(url):
+ # We look for this print in tests, to count the number of clones we did.
+ print('git clone ' + url)
git('clone', '--mirror', '--progress', url, repo_path)
return repo_path
@@ -58,12 +60,6 @@ def repo_cache_path(url):
return os.path.join(CACHE_ROOT, url_hash)
-def clone_and_maybe_print(url):
- if not has_clone(url):
- print('git clone ' + url)
- return clone_if_needed(url)
-
-
def git_fetch(url, repo_path):
print('git fetch ' + url)
git('fetch', '--prune', git_dir=repo_path)
@@ -88,7 +84,7 @@ def already_has_rev(repo, rev):
def checkout_tree(url, rev, dest):
- repo_path = clone_and_maybe_print(url)
+ repo_path = clone_if_needed(url)
if not already_has_rev(repo_path, rev):
git_fetch(url, repo_path)
# If we just use `git checkout rev -- .` here, we get an error when rev is
that would require changing some strategy in testing, because all the testing considers a local git repository creation (I think), correct me if I am wrong
When I've played with ls-remote
by hand, it seems like it works the same way whether or not the remote repo is on the same machine. (So if you're working in an SSH clone, ls-remote
will make network calls.) If so, does that change the situation?
Ah I see what's happening with
test_git_plugin_multiple_fetches
. It's pretty non-obvious, sorry about that: That test is counting the number of clones and fetches that get executed, by looking for print statements in the plugin output. So because the new code is doing a clone without printing, the count is different from what the test expects. Now that I look at the code, I honestly can't remember why theclone_and_maybe_print
needs to exist, and I think the easiest thing to do is to just get rid of it and makeclone_if_needed
do the printing instead. Like this:
Now all tests are passing.
that would require changing some strategy in testing, because all the testing considers a local git repository creation (I think), correct me if I am wrong
When I've played with
ls-remote
by hand, it seems like it works the same way whether or not the remote repo is on the same machine. (So if you're working in an SSH clone,ls-remote
will make network calls.) If so, does that change the situation?
I can't confirm, because I usually don't work with SSH. Shall we do some refactoring? or do we leave this for some future bug haunts us?
This looks good to me. CI is green, so as soon as you're satisfied with the cleanup I'll go ahead and land it. As far as the specific strategy we use for repos that have both "main"
and "master"
, I'm fine with sticking with the current simple approach. If we do something more complicated with ls-remote
, that increases the chance that some unexpected change on the server side that we can't control will "break" us. The whole default branch name situation in Git is inherently unstable anyway. I'd prefer that users who really care about this case just make their preferences explicit in peru.yaml
.
This looks good to me. CI is green, so as soon as you're satisfied with the cleanup I'll go ahead and land it. As far as the specific strategy we use for repos that have both
"main"
and"master"
, I'm fine with sticking with the current simple approach. If we do something more complicated withls-remote
, that increases the chance that some unexpected change on the server side that we can't control will "break" us. The whole default branch name situation in Git is inherently unstable anyway. I'd prefer that users who really care about this case just make their preferences explicit inperu.yaml
.
Agreed. I included this comment in code. If user is using a different approach, then should use the rev option.
Great. I think I am done with this one. CI kept green!
Released as version 1.3.0. Thanks for doing this!
Hi, I just did the modifications we were talking. It actually took me some time, python is not the my main language, and I've been for a while without programming in it. Anyways, it turns out that defining the default branch in git is not something deterministic, and after trying some approaches, it ended up in many failing tests. Reading a bit more, I found out that the only way to make sure about the default branch is consulting the remote or even some API. Instead I decided to use another approach, with a list of possible default branches, it checks if one of these branches are in the list of all branches. There is a corner case where it could have both, and the default turns to be not in first place of the list.
About tests. I couldn't figure out why a test keeps failing. I am pretty sure it is something I did, because it wasn't before, but I couldn't figure out why, could you help on that?
Cheers,