SPECFEM / specfem3d_globe

SPECFEM3D_GLOBE simulates global and regional (continental-scale) seismic wave propagation.
GNU General Public License v3.0
90 stars 95 forks source link

Git development model #181

Closed krischer closed 9 years ago

krischer commented 10 years ago

Hi all,

while I am not (yet) contributing to this project I am trying to follow the development and stay up-to-date. Right now your development model is not particularly clear to me and adds a lot of "noise" that is not necessary in my opinion.

There are three main issues:

  1. For one, what is the purpose of the master and devel branches? There seems to be no clear separation between the two.
  2. The way how (for the most part) pull requests are used right now adds a lot of "noise" as I said above. The opening and immediate closing of a PR already results in two emails for everyone following the repository. If someone follows the commits as well that is up to two more notifications (normal commit + merge commit). Worst case is then four notifications for a single commit.
  3. With your current model it is very hard to offer bugfixes for previous releases while simultaneously working on new features. This is pretty important if one attempts to build a stable platform around some SPECFEM version.

After all my complaints I also have some suggestions on how to fix it.

  1. Adopt a proper git workflow model. The most widely know is probably git-flow. This might be a bit complex especially considering the usually limited number of contributors for seismology projects but it is a nice starting ground to design a proper model. For ObsPy we adopted it a bit to suit our purposes but it works quite well: ObsPy git model
  2. Only use PRs if the feature warrants some discussion and/or code review. The real purpose of pull requests on GitHub are the discussions, otherwise normal git branching is just fine especially for devs with push rights to main repository. Of course outside contributors will have to use PRs but that is fine as you will want to review their code in any case.
  3. Try to avoid merge commits. Of course they are necessary sometimes but to a certain degree one can just perform a fast-forward merge and the merge commit does not show up. This helps a lot when trying to maintain a clean repository history and helps to reason what happened and why.

I hope this sparks some discussion.

Cheers!

luet commented 10 years ago

Hi Lion, The difference between master and devel is that master is our stable branch, the one that users, as opposed to developers, will clone by default. We only push to devel to master when we feel that the development branch is stable enough. The reason for the pull-request is that pull-requests are tested with automatic tests through buildbot. At the moment it's usefulness is limited since we are only compiling the code with three different compilers but we are working on adding some other tests (e.g. comparing the seismograms with some reference) soon. The goal is to implement some continuous testing so that we can catch problems as early as possible.

We started with git-flow but people found it overly complicated. Keep in mind that our developers are coming from subversion so to make the transition to git as easy as possible, we have tried to make git work a lot like subversion.

The merge commits are just the way people are used to working. Our developers don't want, and to some extend don't need to, learn more git than "git pull", "git push", "git pull-request". Our workflow lacks bells and whistles but we had a constraint of simplicity.

As for the notifications, I realize that it's not ideal but it's hard to go without. I would have to set up my own e-mail server for the buildbot and maintaining an e-mail list. It would make the system more complex and therefore less stable. We decided that people could either unsubscribe to the notifications or use mail filters.

Best, David

On Mon, Jul 14, 2014 at 7:22 AM, Lion Krischer notifications@github.com wrote:

Hi all,

while I am not (yet) contributing to this project I am trying to follow the development and stay up-to-date. Right now your development model is not particularly clear to me and adds a lot of "noise" that is not necessary in my opinion.

There are three main issues:

  1. For one, what is the purpose of the master and devel branches? There seems to be no clear separation between the two.
  2. The way how (for the most part) pull requests are used right now adds a lot of "noise" as I said above. The opening and immediate closing of a PR already results in two emails for everyone following the repository. If someone follows the commits as well that is up to two more notifications (normal commit + merge commit). Worst case is then four notifications for a single commit.
  3. With your current model it is very hard to offer bugfixes for previous releases while simultaneously working on new features. This is pretty important if one attempts to build a stable platform around some SPECFEM version.

After all my complaints I also have some suggestions on how to fix it.

  1. Adopt a proper git workflow model. The most widely know is probably git-flow http://nvie.com/posts/a-successful-git-branching-model/. This might be a bit complex especially considering the usually limited number of contributors for seismology projects but it is a nice starting ground to design a proper model. For ObsPy we adopted it a bit to suit our purposes but it works quite well: ObsPy git model https://github.com/obspy/obspy/wiki/ObsPy-Git-Branching-Model
  2. Only use PRs if the feature warrants some discussion and/or code review. The real purpose of pull requests on GitHub are the discussions, otherwise normal git branching is just fine especially for devs with push rights to main repository. Of course outside contributors will have to use PRs but that is fine as you will want to review their code in any case.
  3. Try to avoid merge commits. Of course they are necessary sometimes but to a certain degree one can just perform a fast-forward merge and the merge commit does not show up. This helps a lot when trying to maintain a clean repository history and helps to reason what happened and why.

I hope this sparks some discussion.

Cheers!

— Reply to this email directly or view it on GitHub https://github.com/geodynamics/specfem3d_globe/issues/181.

David Luet Linux Administrator/Software & Programming Analyst Princeton University Department of Geosciences & PICSciE PICSciE (9 am - 1 pm): 609-258-8194 Geosciences (1 - 5 pm): 609-258-7945

krischer commented 10 years ago

Hi David,

in my opinion users should never have to get the code via git - they just download the latest tagged release; you already have all of that setup here: https://github.com/geodynamics/specfem3d_globe/releases

It would also makes sense to tag a new version everytime you choose to merge devel into master so that the users have a version number they can refer to. Then you strictly speaking also don't need the devel branch anymore which further simplify things.

I agree that git-flow is too complex for most needs, especially scientists. That is why in ObsPy most development happens on the master branch. We maintain a separate releases branch so we can backport important fixes to older versions, but the average seismologist who just wants to add some new feature does not have to worry about that. I think something similar would work quite well for SPECFEM.

In regards to the testing, which I can see being super hard giving the nature of SPECFEM, you could also trigger a build on each commit instead of each PR. That is how Travis is doing things, one can also choose who gets notified in case of a failing build (e.g. a set of core devs and the committer). The github webhooks api, which I assume you are already using for the PR triggered builds, can also fire on each commit.

Is it not also simpler to directly push to the main repository instead of pushing to one's own fork, creating a PR, and merging that? Only for small changes of course but those are the ones we are talking about here. This would also solve the notification issue resulting in one notification per commit.

All these little changes would also make it more similar to svn; everything then happens on the master branch similar to a linear svn history.

Cheers!

Lion

luet commented 10 years ago

Hi Lion, One important advantage of testing the pull-request before it gets committed is that at any time the devel branch contains code that passed the buildbot tests i.e. not too badly broken. If we bypass the pull-request and commit directly to devel, if someone pushes a broken code, yes we can revert the changes after testing, but if anyone pulls the code in the meantime they are going to be working off of a broken code. We felt that this made it worth paying the price of the e-mail traffic.

It's also hard to decide when a commit is worth talking about. I have seen commits that the author thought was inconsequential but someone said: "wait a minute, this breaks my feature".

Best, David

On Mon, Jul 14, 2014 at 10:27 AM, Lion Krischer notifications@github.com wrote:

Hi David,

in my opinion users should never have to get the code via git - they just download the latest tagged release; you already have all of that setup here: https://github.com/geodynamics/specfem3d_globe/releases

It would also makes sense to tag a new version everytime you choose to merge devel into master so that the users have a version number they can refer to. Then you strictly speaking also don't need the devel branch anymore which further simplify things.

I agree that git-flow is too complex for most needs, especially scientists. That is why in ObsPy most development happens on the master branch. We maintain a separate releases branch so we can backport important fixes to older versions, but the average seismologist who just wants to add some new feature does not have to worry about that. I think something similar would work quite well for SPECFEM.

In regards to the testing, which I can see being super hard giving the nature of SPECFEM, you could also trigger a build on each commit instead of each PR. That is how Travis https://travis-ci.com/ is doing things, one can also choose who gets notified in case of a failing build (e.g. a set of core devs and the committer). The github webhooks api, which I assume you are already using for the PR triggered builds, can also fire on each commit.

Is it not also simpler to directly push to the main repository instead of pushing to one's own fork, creating a PR, and merging that? Only for small changes of course but those are the ones we are talking about here. This would also solve the notification issue resulting in one notification per commit.

All these little changes would also make it more similar to svn; everything then happens on the master branch similar to a linear svn history.

Cheers!

Lion

— Reply to this email directly or view it on GitHub https://github.com/geodynamics/specfem3d_globe/issues/181#issuecomment-48906083 .

David Luet Linux Administrator/Software & Programming Analyst Princeton University Department of Geosciences & PICSciE PICSciE (9 am - 1 pm): 609-258-8194 Geosciences (1 - 5 pm): 609-258-7945

QuLogic commented 10 years ago

in my opinion users should never have to get the code via git - they just download the latest tagged release; you already have all of that setup here: https://github.com/geodynamics/specfem3d_globe/releases

This is a historical artifact due to our use of submodules, which are not automatically included in GitHub's tarballs, so users could not actually use them. Since we no longer use submodules (except for m4), we could start recommending GitHub's tarballs instead (as long as the user never needs to run autoreconf, that is).

The difference between master and devel is that master is our stable branch, the one that users, as opposed to developers, will clone by default.

Since devel is for developers, the view here on GitHub is somewhat skewed to that end of things. This makes master seem somewhat superfluous. But also because Dimitri likes to recommend devel anyway...

krischer commented 10 years ago

Totally untested, but using a git subtree instead of a submodule might result in them being included in the tarballs. Otherwise one can also attach files to each tagged release on github so you could add an install-ready archive.

If that works you could get rid of the devel branch and just use the master branch. For big commits people will still use separate branches and PRs, small commits (that are usually not in danger of breaking anything) will be directly pushed to the master branch where your CI implementation tests them. If it breaks the build it needs to be fixed in any case. Users will just download the tarballs so they never have an unstable version. This is extremely similar to a typical svn workflow where an unstable trunk is fairly common practice.

Then you have a simpler workflow that is more similar to svn and the "noise" I was complaining about is also reduced.

komatits commented 10 years ago

Hi all,

The only reason why I recommend "devel" to advanced users if that it is the only version I have time to maintain; maintaining two would be too time consuming (keeping in mind that we maintain three codes: 2D, 3D and 3D_GLOBE, thus we would be talking about six versions).

In theory we said people would fix bugs (if any) in "master" as well, in practice I have noticed that this never happens and all bug fixes go to "devel" only.

Thanks, Dimitri.

On 14/07/2014 19:58, Elliott Sales de Andrade wrote:

Since devel is for developers, the view here on GitHub is somewhat skewed to that end of things. This makes master seem somewhat superfluous. But also because Dimitri likes to recommend devel anyway...

Dimitri Komatitsch CNRS Research Director (DR CNRS), Laboratory of Mechanics and Acoustics, UPR 7051, Marseille, France http://komatitsch.free.fr

krischer commented 10 years ago

FYI: I just tested it and the github tarballs do indeed contain the contents of any subtrees.

komatits commented 9 years ago

Solved by having BuildBot check the seismograms, not only compiling the code without running it. David Luet is working on that and should be done in a few weeks.