CliMA / ClimateMachine.jl

Climate Machine: an Earth System Model that automatically learns from data
https://clima.github.io/ClimateMachine.jl/latest/
Other
451 stars 78 forks source link

Git workflow discussion #964

Open leios opened 4 years ago

leios commented 4 years ago

Git workflow

A few people (who will remain nameless for this discussion) have had some trouble with rebasing onto the current master and squashing their commits before merging. Right now, this is an issue that is taking a large amount of time for new developers. This also takes time from developers who are knowledgeable in git and version control because they need to step through the process with new developers.

A while ago, the question was raised about whether it would be possible to have bors squash and merge, but this was too difficult because we could not adequately attribute people for their code contributions. I think we can solve this issue by adding Co-authored-by: ... in the PR discussion.

I would like to have a discussion about this a bit. Right now, I can see 3 basic solutions:

  1. Add Co-authored-by: ... to the PR template so people can attribute people in the case they prefer bors to squash and merge. I suppose this would allow both squash & merge and rebase workflows.
  2. Add a simple script to do the squashing and rebasing up to current master (essentially git reset origin/master && git add --all && git commit -m "Single commit message"
  3. Just teach git.

Any solution is fine with me, but I want to make sure people have a place to discuss their problems with a rebase-heavy workflow, if there are any.

kpamnany commented 4 years ago

IMO, once you know the git commands, the only difficult thing to do is to resolve merge conflicts. And I don't know any way to automate this -- the author has to do it.

trontrytel commented 4 years ago

Expanding on point 3. Maybe it would help to have in our wiki a link to step by step guide on how to squash and rebase? This was helpful for me: https://blog.carbonfive.com/always-squash-and-rebase-your-git-commits/

It's basically a list of all git commands you need to rebase. Don't know if its the best resource out there, but it successfully got me through my first couple of rebases.

kpamnany commented 4 years ago

It recommends using git pull, which is a git fetch combined with a git merge. This can cause problems. Also, peering through git log can be a pain. git merge-base <mybranch> origin/master gives you the commit hash on which you can rebase to squash -- I find it far easier.

I always point people at this one: https://github.com/edx/edx-platform/wiki/How-to-Rebase-a-Pull-Request

jakebolewski commented 4 years ago

It would seem like being able to quickly hunt down offending commits that change model results materially would be an important goal, so I would think that the workflow would hopefully squash commits by default before automated merge so that you can be confident that every commit at least works (and by works passes the unit and integration checks by bors). You can do this manually but there is no check then that the manual process holds this property. It seems like it would be difficult to begin to track down changes without some automated method to bisect over the commit history.

trontrytel commented 4 years ago

It recommends using git pull, which is a git fetch combined with a git merge. This can cause problems. Also, peering through git log can be a pain. git merge-base <mybranch> origin/master gives you the commit hash on which you can rebase to squash -- I find it far easier.

I always point people at this one: https://github.com/edx/edx-platform/wiki/How-to-Rebase-a-Pull-Request

Maybe that proves we should have a guide on it that you guys approve.

@kpamnany - the one you refer has a lot of explanations in it. I liked the one I referred because it was less explanation heavy and more copy-paste friendly ;)

vchuravy commented 4 years ago

and more copy-paste friendly ;)

git checkout master
git pull
git checkout -b BRANCH_NAME_HERE
# do work, and use git commit
# 5-days later
git fetch
git rebase -i origin/master

For bonus points use imerge and replace the last line

git imerge --rebase origin/master

And now you have my workflow.

kpamnany commented 4 years ago

I find that in the end, it's useful to understand what those commands are actually doing...

kpamnany commented 4 years ago

the workflow would hopefully squash commits by default before automated merge

But then you can't do multiple things in one PR! 😀

Your point is very valid though. Right now we're trusting folks to squash such that each commit passes CI. This is not scalable.

trontrytel commented 4 years ago

I find that in the end, it's useful to understand what those commands are actually doing...

:) I think I'm at the point when I understand what the commands are doing. But I don't remember the commands from the top of my head when I need them. Maybe in time I will, but for now I need a reference copy-paste friendly cheat sheet.

leios commented 4 years ago

@kpamnany and @jakebolewski, I agree with what you are both saying. Right now, there is no difference in the git log between bors squashing and developers squashing (assuming each PR is an independent feature and can be squashed to a single commit). Why not just allow bors to squash and merge? If users squash themselves, their commit message will still be shown. If they don't, that's fine as well.

leios commented 4 years ago

To be clear: I don't think squashing is that big of a problem for developers, but it is sometimes hard to follow PR's if things are squashed prematurely.

I think the more difficult issue for new users is actually rebasing up to master, which is not easily solvable. We have 3 ways to do it: reset, imerge and good ol' fashioned rebase. I think imerge is the easiest, but I am unfamiliar with it.

christophernhill commented 4 years ago

@jakebolewski and @leios

It would seem like being able to quickly hunt down offending commits that change model results materially would be an important goal, so I would think that the workflow would hopefully squash commits by default before automated merge so that you can be confident that every commit at least works (and by works passes the unit and integration checks by bors). You can do this manually but there is no check then that the manual process holds this property. It seems like it would be difficult to begin to track down changes without some automated method to bisect over the commit history.

855 had multiple commits that were not all squashed? Its hard to say whether they all passed bors, but the breakdown into stages still looks useful? Certainly it is important that final merge passes, but I could imagine commit interim stages that may not pass bors, but still be useful to record (maybe?).

kpamnany commented 4 years ago

To expand on @jakebolewski's point: the point here is automated bisection.

It is often the case that you discover a bug long after it was introduced. It is important to determine when it was introduced, because you may not be able to find the exact location of the bug (it is a behavior change) or there may be related problems that are as yet undiscovered. This happens often enough that there is a way to to automatically find the commit that introduces a bug (using git bisect run).

However, if all commits do not pass CI, this automated mechanism can (and due to Murphy's Law, often will) fail on one of those broken commits. This can lead to much (avoidable) pain.

jkozdon commented 4 years ago

However, if all commits do not pass CI, this automated mechanism can (and due to Murphy's Law, often will) fail on one of those broken commits. This can lead to much (avoidable) pain.

You could have git bisect only test bors commits:

> cat test.sh
#!/bin/sh

AUTHOR=`git log -1 --pretty=format:'%an' | xargs`
if [ "$AUTHOR" = "bors[bot]" ]; then
  # Just a silly test to see if the `src/Numerics` directory exists
  if [ -d "src/Numerics" ]; then
    # this is a good commit!
    exit 0
  else
    # this is a bad commit!
    exit 1
  fi;
else
  # skip this commit (is it good or bad? I dunno!)
  exit 125
fi;

> git bisect start HEAD f5a95b

> git bisect run ./test.sh

This will return all the non-bors commits that might be broken.

kpamnany commented 4 years ago

For as long as we're using bors, yes, I think that will work.

simonbyrne commented 4 years ago

One option is to change bors to always use squash merge (by setting use_squash_merge = true). Unfortunately it doesn't seem possible to opt-in per PR, so it would be an all-or-nothing change.

simonbyrne commented 4 years ago

@lcw mentioned https://github.com/tummychow/git-absorb

Decision is to not handle it via bors, encourage users to leave squashing/rebasing until ready to merge.

blallen commented 4 years ago

git-absorb looks very very nice