ankidroid / Anki-Android

AnkiDroid: Anki flashcards on Android. Your secret trick to achieve superhuman information retention.
GNU General Public License v3.0
8.56k stars 2.22k forks source link

please consider changing to a development mode that does not involve creating merge commits #121

Closed wchristian closed 10 years ago

wchristian commented 10 years ago

So, the events for me of today:

I ended up with a history as you can see in: https://github.com/wchristian/Anki-Android Screenshot: https://www.dropbox.com/s/nic6el1lzzvfhze/ankidroid.png

Not a single commit was changed in the rebase/cherry-pick process, they smoothly fit onto one another in a linear fashion. Aside from a single commit where the git algorithm made a mistake i didn't even have any conflicts.

I cleaned the entire history going back to the point where current dev and 2.0.1 diverged, all in the space of two hours. The cost of implemeting this in actual ongoing development would be minimal, as it would be only a case of rebasing/cherry-picking the branches github reports as clean for merge.

Additionally i'd happily volunteer to do the busywork and offer aid in helping others over any issues they might have with this.

What do you say?

flerda commented 10 years ago

Hello Christian,

The presence of merges is actually expected and it is part of the workflow in GitHub. A similar workflow is also used also by other projects, for example, by Android (see, https://android.googlesource.com/platform/frameworks/base.git)

I do not think at this time we want to go and change the existing history for the project.

Moreover, I do not think the merges are wrong, they are just part of what the workflow with GitHub and git is expected to be. Well a pull request comes in, the default (and I think only) way to accept it on GitHub is to merge it, which also means it points back to any discussion on the actual change that might have occurred. As far as git is concerned cherry-picking is creating a new change that happens to have the same code as some other change, while merging is about taking the content while referring back to the original code. I think it is part of what distributed version control is about.

If you look at it from a centralized version control perspective it is true that merges are sort of the exception, but from the git perspective, merges are just the norm.

The instructions at: https://github.com/ankidroid/Anki-Android/tree/master should help in starting to contribute to the project.

Since the project is rather small (few contributors and few changes overall) we could definitely do a better job at documenting things.

That's at least my view of it, at least. I am happy to hear from other think.

Flavio

wchristian commented 10 years ago

Flavio, thanks for the answer and detailing of your thoughts.

To be clear about something: I was not proposing to rewrite all history, i only did that so i could actually review the changes (try doing that with the current repo structure) and to demonstrate the differences. I am proposing to adapt the workflow in the future.

Further, this is not entirely correct:

Well a pull request comes in, the default (and I think only) way to accept it on GitHub is to merge it, which also means it points back to any discussion on the actual change that might have occurred.

The default way is to do a -ff merge in your local repo, and push that to the desired branch. Github will recognize what happened and close the pull request. In other words: The default way is to not actually touch Github at all to close a pull request.

Alternatively, if you insist on the have a pointer between the commits and the pull request, the common way is to cherry-pick the commits you'd like, and add a string like (fix #1) to the commit message. Github will note that the commit referred to the pull request in the post history. After that you can manually click close and everything is fine without any extra noise.

Additionally this is only necessary in edge cases. This is because normally you can simply ask the contributor to rebase their branch on the intended target and to add the string (fix #1) themselves. Then you can just do a basic -ff merge and the correct connections are made and the pull request closed without any noise.

Lastly, the android base framework project and many others do it this way simply because they're bigger projects with less experienced contributors. They are in fact ignoring advantages of a distributed source control system with the capability to manipulate the directed acyclic graph for the simple reason that it spares people who know little about git from having to learn anything about git. The structure you see in the android base repo is in fact identical to the one you would have seen in many big SVN projects back in the day.

The way i am suggesting is not a deviation from the core values of git, but an embracing of it. It would give you a history that can actually be used to research why changes were done, and which can be processed with git bisect to find commits that introduced bugs. Right now neither of these are possible with your history. The only disadvantage would be that contributors, which you have few of anyhow, would be asked to rebase, or you or i would do an extra step in cherry-picking. (As i said, i'm volunteering.)

flerda commented 10 years ago

Do you know if there is a way to do a merge --ff from the GitHub UI? I could not find such an option.

In the meanwhile I will start going merges from the command line with --ff when possible (i.e., I have access to the command line when doing the merge).

I have been trying to enforce that pull requests contain a single change, to avoid this problem exactly. In that case, git bisect should work just fine I believe.

I agree the aggressive merge back and forth between branches of various people with multiple changes makes things difficult to track.

wchristian commented 10 years ago

I'm afraid github doesn't offer that because its pull req merge interface is meant to be a fallback for those who're less familiar with the command line or in a hurry.

Also keep in mind that the exact command line option is --ff-only and that it'd be advantageous to have those commits mention the issue id to allow backtracking from the git log to the issues. (Maybe something to ask people to do in the contrib docs to avoid needing to ask for rebases?)

I have been trying to enforce that pull requests contain a single change, to avoid this problem exactly. In that case, git bisect should work just fine I believe.

Bisect is not bothered by the amount of commits in a branch, but by whether the branches cross over one another. Sitation: A starts a new feature branch and starts work on it. An old feature branch B is merged in. Feature branch A is merged in, has a conflict with B and the merge commit does some extra changes. A bug is found. Now how does bisect figure out whether the bug was caused in A, B, by the combination of the features of A and B, or by the conflict resolution done on the merge of A?

In the optimal case this is entirely avoided by asking the developer of A to rebase his branch to the tip, retest and force-push.

And lastly, thanks for being open to this discussion. I appreciate that very much. :)

flerda commented 10 years ago

I do not feel the GitHub UI is a fall-back, because that's where we do our reviews. So, at least for me, that's the place where the pull request is handled and merged. For me, the command line is the fall-back.

We do not allow for conflict on merge, so the merge commit is simply applying the changes in A.

So, the branch would look like:

MERGE-A -> A \/ MERGE-B -> B \/ BASE

Bisect does not need to care about A and B themselves, because MERGE-A and MERGE-B are sufficient to figure it out.

We will still have merges across branches, since we have concurrent development.

Thanks for bringing your contributions to the project and its process.

flerda commented 10 years ago

Oh, also I wanted to say that for us, feature branches have a single commit on them. The author is responsible for rebasing and squashing before sending a pull request (or send multiple pull requests if they thing that's better). I have not been enforcing this 100% to avoid bother contributors, but, in my opinion, that's the way it should be.

What we allow is multiple version branches, in which development can proceed without any particular ordering, even if, for the most part, we focus on a single version at a time.

wchristian commented 10 years ago

I do not feel the GitHub UI is a fall-back, because that's where we do our reviews.

I only meant the merge functionality, not the whole discussion thing. :) (Although one can also do these entirely in email.)

We will still have merges across branches, since we have concurrent development.

That's exactly the danger though, since it becomes hard to tell whether a bug that is discovered later was already inherent to a feature branch, or caused by the combination of it and a branch it crossed in merging, since the author only tested his own branch. Requiring the author to test his branch on top of the current HEAD ensures that possible bugs due to the combination are caught by the person most knowledgable about the changes being merged.

To be clear here: Merge commits are unsightly, yes, but they're actually acceptable as long as a policy of "never cross branches" is enforced. I'm not sure if this is what you're going for, but trying to avoid merge commits and still allowing branches to cross doesn't gain much of an advantage.

As for squashing: That is a tool meant to reduce the noise created by making commits while trying to figure out an implementation, but asking for it by default risks losing a lot of valuable history and decision making. I don't feel strongly on this, but i think it's something to consider.

multiple version branches

Are you going to start rebasing those too? That was one of the most difficult things in review: The fact that the current betas branch off before the current stable release.

emhj commented 10 years ago

So I'm late to the discussion and new to the project -- I'm not looking to cause too much trouble.

But I am interested in gaining insights as to why you would want an absolutely linear history? Sure, you will not see merge commits, but, then again, it will be difficult to track where a feature came from after a pull request. You did point out that you can reference an issue that is fixed by a certain pull request; however, this does not lend itself easily to find all commits related to a particular feature. I think this is why the current development model in Anki-Android is to have all changes be in a single commit. By the way, a single commit with a large amount of changes makes cherry picking nearly impossible. I'm not in favor of this model either.

I do completely agree with Christian about the history looking like a nightmare. I think this is because no clear rules are enforced during a pull request and there is no clear "stable" or "master" branch. In my opinion, pull requests should happen on a temporary branch created off the "master" branch (in this case v2.1-dev) for integration and testing. If the merge does not look clean (and it will be obvious just from looking at the history), then that pull request should be rejected with a note asking to clean up whatever parts look wrong.

I will admit, I'm largely biased by the arguments made in "A Successful Git Branching Model".

The reason I'm curious to hear your thoughts is because we're currently dicussing this in great detail at my company, and most people (myself included) have concluded quick-concise feature branches related to a particular issue, with clear merge commits for that feature branch, makes management and bug tracking easier. But maybe we've been misled?

Like I said, I'm not trying to stir up trouble... I want to have someone else's perspective on this matter.

Best, Eric

emhj commented 10 years ago

For what it's worth, I support many of the suggestions that the OpenShift project has for contributing.

Best, Eric

wchristian commented 10 years ago

But I am interested in gaining insights as to why you would want an absolutely linear history?

Pictures, thousand words, etc. This screenshot shows a compressed graph view of the history as it currently exists, can you tell what's going on with each dev branch and what happened between them? (For bonus points imagine trying to make sense of it without the view compression.)

https://dl.dropboxusercontent.com/u/10190786/Screenshot%202013-11-07%2013.57.37.png

Now this screenshot shows the exact same history, rebased conflict-free and merge-free. All actual change commits and tags and branch markers are in exactly the same state as their corresponding partners in the current history. Isn't it much easer to tell what happened?

https://dl.dropboxusercontent.com/u/10190786/Screenshot%202013-11-07%2014.03.29.png

Additionally there is, as mentioned before, the issue of testing. If you have two feature branches from different authors and cross-merge them, there are likely to be bugs caused by the combination of the two that aren't apparent in the author's testing of each of their branches. Merging branches exclusively after having them rebased and retested on HEAD means that every test also includes the changes they will be combined with. Keep in mind: Absence of merge conflicts does not prove absence of bugs.

you will not see merge commits, but, then again, it will be difficult to track where a feature came from after a pull request.

If you have a feature that was made in a particularly long feature branch, then there's nothing wrong with rebasing that and then merging it with an explicit merge commit, as long as you don't cross any other branches.

Typically however it is much more interesting to figure out which commit caused an error one's seeing, than from which branch of feature that error is coming from.

I will admit, I'm largely biased by the arguments made in "A Successful Git Branching Model".

This is not a terrible model, but it comes with caveats: It completely ignores the existence of rebasing, and acts like SVN. That was appropiate in a time when SVN dominated the market. Not anymore. Secondly, it works for a specific kind of product, namely software where a product is released to market and there exist solid reasons for users sticking with older major versions, thus requiring the maintenance of older major versions along with new versions. AnkiDroid or for example websites do not fall into this group, as the users only ever have the latest version.

quick-concise feature branches related to a particular issue, with clear merge commits for that feature branch, makes management and bug tracking easier. But maybe we've been misled?

That sounds good, but it doesn't mention the most important part: Never merge a feature branch without rebasing and retesting it on HEAD.

OpenShift

I like their stance on squashing because it mentions both failure cases.

emhj commented 10 years ago

Isn't it much easer to tell what happened?

I do completely agree with Christian about the history looking like a nightmare.

I agree that the history you're proposing is much simpler to understand. I agreed before, too.

If you have two feature branches from different authors and cross-merge them, there are likely to be bugs caused by the combination of the two that aren't apparent in the author's testing of each of their branches. ... Absence of merge conflicts does not prove absence of bugs.

Merging feature branches, while certainly possible, seems like a terrible idea in my opinion, for the same reason you mentioned (so no arguments here).

It completely ignores the existence of rebasing, and acts like SVN. That was appropiate in a time when SVN dominated the market. Not anymore.

I'm not sure I follow. From what I've read online, albeit limited, it seems rebasing should only be used on local branches prior to pushing and during a pull to avoid having meaningless commit messages. As I understand it, while git can support rebasing public branches, convential wisdom seems to say that this is a bad idea because it mucks with other users history, causing headaches for those users.

I will fully admit to having limited experience with rebasing. Squashing commits definitely seems like a good idea, but only before commits are pushed upstream, not after.

Secondly, it works for a specific kind of product, namely software where a product is released to market and there exist solid reasons for users sticking with older major versions, thus requiring the maintenance of older major versions along with new versions.

I think this is a matter of project choice. There is no obligation to support older versions under this model; instead the model suggests a strategy for doing so.

AnkiDroid or for example websites do not fall into this group, as the users only ever have the latest version.

I think this is a reasonable project choice.

wchristian commented 10 years ago

I agreed before, too.

Ah, sorry, that didn't take complete hold in my head. :)

rebasing I'm not sure I follow.

Two things here: How much you can rebase and where and when depends mostly on the project. Feature branches for example are almost always rebasable. When they are however the product of 10 people working on them then you can only rebase if the communication in the team and the knowledge level of the team allows for it.

Similarly release preparation branches can also be rebased, even more so than feature branches, because their explicit purpose is collecting commits for the next release, and every branch starting from them is bound to be a feature branch which are expected to b rebased anyhow before being merged in.

The one thing that is taboo to rebase are the commits that happened before an actual release of the software.

Basically, the question you need to ask yourself when rebasing is ALWAYS: "How many people will this affect?" If it's everyone, then don't. If it's only you, just do it. Anywhere inbetween: Decide case by case and weigh whether a useful history makes up for the inconvenience to others.

I think this is a matter of project choice.

The thing is: Once you remove the support for old versions, 90% of that article becomes meaningless and irrelevant. :)

emhj commented 10 years ago

Two things here: How much you can rebase and where and when depends mostly on the project. Feature branches for example are almost always rebasable. When they are however the product of 10 people working on them then you can only rebase if the communication in the team and the knowledge level of the team allows for it.

Similarly release preparation branches can also be rebased, even more so than feature branches, because their explicit purpose is collecting commits for the next release, and every branch starting from them is bound to be a feature branch which are expected to b rebased anyhow before being merged in.

The one thing that is taboo to rebase are the commits that happened before an actual release of the software.

Basically, the question you need to ask yourself when rebasing is ALWAYS: "How many people will this affect?" If it's everyone, then don't. If it's only you, just do it. Anywhere inbetween: Decide case by case and weigh whether a useful history makes up for the inconvenience to others.

Agreed: rebasing is useful when used appropriately.

I think something I didn't understand before was whether you were suggesting to have a single master branch and only bring features in through fast forward merges. I believe you're not suggesting this, but rather cleaning up the commit logs of a feature branch using rebase prior to submitting a pull request. Is my assessment correct?

wchristian commented 10 years ago

cleaning up the commit logs of a feature branch using rebase prior to submitting a pull request. Is my assessment correct?

Almost: Cleaning up by the feature branch author before or during the pull request (a force push allows updating of pull requests in this manner) or (if necessary) by the release manager when accepting the pull request. Either is fine, as long as it is cleaned up. :)

At that point the decision of whether to use fast-forward or not does not impact code quality anymore, even though it still impacts a bug researcher's ability to find bug causes by forcing them to trudge through noise.

nicolas-raoul commented 10 years ago

Hello, Issues/features should be posted to https://code.google.com/p/ankidroid/ We will now disable the tracker on Github, so that issue reporting becomes less confusing.

Interesting conversation, but could you please continue on the mailing list?

Thanks for your feedback! Nicolas Raoul