Fix/default branch issue

felipefoz commented 3 years ago

Hi, I just did the modifications we were talking. It actually took me some time, python is not the my main language, and I've been for a while without programming in it. Anyways, it turns out that defining the default branch in git is not something deterministic, and after trying some approaches, it ended up in many failing tests. Reading a bit more, I found out that the only way to make sure about the default branch is consulting the remote or even some API. Instead I decided to use another approach, with a list of possible default branches, it checks if one of these branches are in the list of all branches. There is a corner case where it could have both, and the default turns to be not in first place of the list.

About tests. I couldn't figure out why a test keeps failing. I am pretty sure it is something I did, because it wasn't before, but I couldn't figure out why, could you help on that?

file: peru\tests\test_plugins.py", line 168, in test_git_plugin_multiple_fetches
    self.assertEqual(output.count("git clone"), 1)
AssertionError: 0 != 1

Cheers,

oconnor663 commented 3 years ago

Another strategy we could consider here: I believe when a repo is cloned, the local HEAD is copied from the remote HEAD. So if a repo has both master and main branches, we could figure out which one to use by just looking at HEAD (git branch --show-current) after the clone. I fact, we might not need to special-case master and main at all with that approach.

However, one issue I see there, is that I think the state of our HEAD could get stale. If we cloned the repo last week, and this week the default branch was changed, would we ever find out without clearing our cache? It looks like git ls-remote --symref might be a workaround for that, to let us directly inspect the HEAD of the remote, but I'm not 100% clear on how that works yet.

Btw, it looks like this position of the --symref flag is important. So this works (actually it works even if you're not inside a repo, which is nice):

git ls-remote --symref https://github.com/buildinspace/peru HEAD

But moving the --symref flag any farther to the right means we don't get the symref output. That's kind of lame.

felipefoz commented 3 years ago

Another strategy we could consider here: I believe when a repo is cloned, the local HEAD is copied from the remote HEAD. So if a repo has both master and main branches, we could figure out which one to use by just looking at HEAD (git branch --show-current) after the clone. I fact, we might not need to special-case master and main at all with that approach.

However, one issue I see there, is that I think the state of our HEAD could get stale. If we cloned the repo last week, and this week the default branch was changed, would we ever find out without clearing our cache? It looks like git ls-remote --symref might be a workaround for that, to let us directly inspect the HEAD of the remote, but I'm not 100% clear on how that works yet.

Btw, it looks like this position of the --symref flag is important. So this works (actually it works even if you're not inside a repo, which is nice):
git ls-remote --symref https://github.com/buildinspace/peru HEAD
But moving the --symref flag any farther to the right means we don't get the symref output. That's kind of lame.

I thought of something like this, but that would require changing some strategy in testing, because all the testing considers a local git repository creation (I think), correct me if I am wrong. For this reason, I thought this approach would solve 99% of the problems.

oconnor663 commented 3 years ago

Ah I see what's happening with test_git_plugin_multiple_fetches. It's pretty non-obvious, sorry about that: That test is counting the number of clones and fetches that get executed, by looking for print statements in the plugin output. So because the new code is doing a clone without printing, the count is different from what the test expects. Now that I look at the code, I honestly can't remember why the clone_and_maybe_print needs to exist, and I think the easiest thing to do is to just get rid of it and make clone_if_needed do the printing instead. Like this:

diff --git a/peru/resources/plugins/git/git_plugin.py b/peru/resources/plugins/git/git_plugin.py
index 93f0aa4..dab533a 100755
--- a/peru/resources/plugins/git/git_plugin.py
+++ b/peru/resources/plugins/git/git_plugin.py
@@ -40,6 +40,8 @@ def has_clone(url):
 def clone_if_needed(url):
     repo_path = repo_cache_path(url)
     if not has_clone(url):
+        # We look for this print in tests, to count the number of clones we did.
+        print('git clone ' + url)
         git('clone', '--mirror', '--progress', url, repo_path)
     return repo_path

@@ -58,12 +60,6 @@ def repo_cache_path(url):
     return os.path.join(CACHE_ROOT, url_hash)

-def clone_and_maybe_print(url):
-    if not has_clone(url):
-        print('git clone ' + url)
-    return clone_if_needed(url)
-
-
 def git_fetch(url, repo_path):
     print('git fetch ' + url)
     git('fetch', '--prune', git_dir=repo_path)
@@ -88,7 +84,7 @@ def already_has_rev(repo, rev):

 def checkout_tree(url, rev, dest):
-    repo_path = clone_and_maybe_print(url)
+    repo_path = clone_if_needed(url)
     if not already_has_rev(repo_path, rev):
         git_fetch(url, repo_path)
     # If we just use `git checkout rev -- .` here, we get an error when rev is

oconnor663 commented 3 years ago

that would require changing some strategy in testing, because all the testing considers a local git repository creation (I think), correct me if I am wrong

When I've played with ls-remote by hand, it seems like it works the same way whether or not the remote repo is on the same machine. (So if you're working in an SSH clone, ls-remote will make network calls.) If so, does that change the situation?

felipefoz commented 3 years ago

Ah I see what's happening with test_git_plugin_multiple_fetches. It's pretty non-obvious, sorry about that: That test is counting the number of clones and fetches that get executed, by looking for print statements in the plugin output. So because the new code is doing a clone without printing, the count is different from what the test expects. Now that I look at the code, I honestly can't remember why the clone_and_maybe_print needs to exist, and I think the easiest thing to do is to just get rid of it and make clone_if_needed do the printing instead. Like this:

Now all tests are passing.

felipefoz commented 3 years ago

that would require changing some strategy in testing, because all the testing considers a local git repository creation (I think), correct me if I am wrong

When I've played with ls-remote by hand, it seems like it works the same way whether or not the remote repo is on the same machine. (So if you're working in an SSH clone, ls-remote will make network calls.) If so, does that change the situation?

I can't confirm, because I usually don't work with SSH. Shall we do some refactoring? or do we leave this for some future bug haunts us?

oconnor663 commented 3 years ago

This looks good to me. CI is green, so as soon as you're satisfied with the cleanup I'll go ahead and land it. As far as the specific strategy we use for repos that have both "main" and "master", I'm fine with sticking with the current simple approach. If we do something more complicated with ls-remote, that increases the chance that some unexpected change on the server side that we can't control will "break" us. The whole default branch name situation in Git is inherently unstable anyway. I'd prefer that users who really care about this case just make their preferences explicit in peru.yaml.

felipefoz commented 3 years ago

This looks good to me. CI is green, so as soon as you're satisfied with the cleanup I'll go ahead and land it. As far as the specific strategy we use for repos that have both "main" and "master", I'm fine with sticking with the current simple approach. If we do something more complicated with ls-remote, that increases the chance that some unexpected change on the server side that we can't control will "break" us. The whole default branch name situation in Git is inherently unstable anyway. I'd prefer that users who really care about this case just make their preferences explicit in peru.yaml.

Agreed. I included this comment in code. If user is using a different approach, then should use the rev option.

felipefoz commented 3 years ago

Great. I think I am done with this one. CI kept green!

oconnor663 commented 3 years ago

Released as version 1.3.0. Thanks for doing this!

buildinspace / peru

Fix/default branch issue #214