BVLC / caffe

Caffe: a fast open framework for deep learning.
http://caffe.berkeleyvision.org/
Other
34.13k stars 18.68k forks source link

Community issues and policy feedbacks #1623

Closed bhack closed 7 years ago

bhack commented 9 years ago

In this issue i want to try to collect feedback and policy proposal by core developers and community contributors (we don't have a developer group like caffe-user). Please try to do it in a constructive and scalable prospection. The objective, with the proliferation of deep learning toolkit and research momentum, is to try attract more contributors in Caffe and to retain developers that already opened good PR.

bhack commented 9 years ago

cc: @mtamburrano

netheril96 commented 9 years ago

You need to get those in charge involved to have any real progress. caffe is run by a dictatorship (benevolent if you'd like), not a democracy.

bhack commented 9 years ago

For ones that doesn't know what is the governance model that @netheril96 cited in the open source context can read this and more generally on governance models cc: @BVLC

bhack commented 9 years ago

It has a little bit of lag but see also some statistics http://ghtorrent.org/pullreq-perf/BVLC-caffe/

bhack commented 9 years ago

cc: @sguada @shelhamer @longjon @sergeyk @Yangqing @jeffdonahue @kloudkl @qipeng @rbgirshick

ducha-aiki commented 9 years ago

Looks like nobody from core-developers cares for filter and other PRs from community( Even cool OpenMP branch https://github.com/BVLC/caffe/pull/439 still unmerged for months...

bhack commented 9 years ago

A free tool that could help project management integrated with github: https://www.zenhub.io

netheril96 commented 9 years ago

As I have suspected, the owners don't care about this at all. The only recourse is to split off in a different project, as have been done numerous times in the world of open source. Not sure how feasible that is, however.

bhack commented 9 years ago

@shelhamer Is there any sign of life?

longjon commented 9 years ago

Some thoughts on this:

Given our current limited resources, there is necessarily a tradeoff between "high precision" (confidence in the correctness of our code) and "high recall" (merging lots of code quickly). I think it's best for everybody to keep BVLC/caffe in the "high precision" regime, as one can always get higher recall with forks, branches, PRs, and so on, and git and GitHub make it pretty easy to keep all those things in the air at once.

That said, there are certainly ways we can improve the process. Foremost I think we can improve communication by:

  1. Keeping every PR tagged with its status (something we've discussed but haven't yet implemented).
  2. Responding to every community PR promptly with prospects for merging, even if it's to say that something is low priority.

We could also take some technical steps to make it easier for code to live outside BVLC/caffe. We could:

  1. Create a contrib repo for extra layers and tools. We would provide Travis testing but otherwise leave merging decisions up to the community.
  2. Make it easier to link external layer code with Caffe, and provide a central place for layer libraries.

Comments on those ideas or other concrete suggestions are welcome.

bhack commented 9 years ago

For me the high precision is a concept of quality assurance of a distribution. If caffe with contrib, plugins system or whatever else become a distribution. We have already experienced opencv-contrib on github with the opencv project adding testing infrastructure support for it. But this doesn't seem to work fine. Without a governance model will not be clear how a contributed repository could scale, how we can limit duplicates proliferation, and bypass the bottleneck of core members limited time to review and maintain code. I think that we need to find a governance model and a clear review pipeline. Then if we want to maintain officially external code we need to do in a "caffe distribution" fashion.
So we can have official maintainers for every contributed layers, loss or other components that could have a role for fixing bug, merge request maintain release compatibility of his components. If a maintainer is missing in action other maintainers or core members could superseded the maintainers role. If a layer, loss or other components are not maintained anymore (bugs are not fixed, api incompatibility with new releases, missing review of PR etc.) we could remove the code from the official "caffe distribution". So that caffe could maintain a general quality also with a modularity.

But for changes that are not isolated in a "modular component" we need to find a clear pipeline for every PR. I think also that every PR need to have a clear assigned reviewer. Other people can support, contribute and comment but the PR need to advance in a pipeline handled by the responsibility of the assigned core member. Because often contributors are very active to accept feedbacks from reviews but then core members are missing in action or change idea on what to do for a "more ideally" integration. We need to have clear feedbacks on what work is needed to do to let a PR merge.

netheril96 commented 9 years ago

@longjon

I believe the problem is not what or how, but who. Your idea is nice, but I am afraid that it won't be implemented even if we all agree it is a good idea, simply because the maintainers are two few in number and in time.

caffe has grown beyond a simple project that can be managed by a small panel of original developers. To scale up, the governance must be changed.

Nerei commented 9 years ago

Corporate contribution is possible. At least such ideas are being discussed/planned now. It could be paid full time developers' work. (@garybradski)

qipeng commented 9 years ago

I am personally more with the original maintainers of this project that we should keep this repository of "high precision" instead of focusing on fast development. Either as a developer or as an end user, I wouldn't want to see random stuff being merged into master or dev without thorough unit tests or handheld code review that might break other things or make my training/testing 10x slower. It can be crucial for a code base like Caffe itself, because we care a lot about correctness and efficiency as it's being used as benchmarks in many places now. Keeping master and dev slow but steady is a good idea, I think.

That being said, I do agree that an experimental (or contrib) branch / fork that's slightly more loosely managed by the community is a good idea, conditioned on there being responsible and competent people in the "governance loop" that keeps the project relatively maintainable and of good quality. Otherwise something like Caffe that demands both engineering experiences and familiarity with academic materials could go bad very easily and very fast.

netheril96 commented 9 years ago

@qipeng The problem is not that third party codes are not merged fast enough, but that they are barely reviewed at all. Many pull requests just sit there and rot, without review, feedback or action.

bhack commented 9 years ago

I think that nobody here want to lower the quality of the code. But actually it is clear that there is a scalability issue given by the actual contributors/core members availability ratio. My issue, when I mentioned the opencv-contrib experience, is that proposing a contrib repository without a policy and governance don't solve the problem if we don't choice a governance that can scale and maintain a good quality assurance. Every project that have scaled have one. I generally like meritocracy and clear responsibility by team or individuals for user contributed modules (layers, loss or every other modularity you can find in caffe) to maintain a good level of QA of the whole "caffe distribution". I suggest to everyone that want to contribute to this discussion to read the governance models link I've posted. Putting user code in an easy mergiable (or simply API compliant) "trash sandbox" is not the solution to scale with quality and minimizing duplication and fragmentation trends. We need to enlarge trusted responsibility to let emerge new members by meritocracy whatever community governance we choice.

qipeng commented 9 years ago

@netheril96 I agree that that's happening to some extent. But still I think @bhack 's idea about opening a separate contrib repo would be great, where we can get more community developers involved in the governance and make things scale faster. And eventually, the core members might choose to talk to the community leaders and port back some high quality code from that repo.

bhack commented 9 years ago

See also updated stats

shelhamer commented 9 years ago

one can always get higher recall with forks, branches, PRs, and so on, and git and GitHub make it pretty easy to keep all those things in the air at once.

@longjon has this exactly right. We can keep BVLC/caffe in a high precision regime while everyone shares whatever code they like in forks and branches. It's important to keep Caffe correct and efficient since it has both research and industrial purposes.

At the same time a dedicated "contrib" repo could be a gathering place to collect code like this to later be accepted into BVLC/caffe as @bhack and @qipeng mentioned.

Otherwise something like Caffe that demands both engineering experiences and familiarity with academic materials could go bad very easily and very fast.

@qipeng this is of course what needs to be guarded against.

@bhack thanks for the graph -- too bad the format makes it hard to read the counts. Deadlines aside, it looks like there is a steady rate or merged and closed PRs... not that there couldn't be more of course.

The problem is not that third party codes are not merged fast enough, but that they are barely reviewed at all

@netheril96 to be fair there are ~300 commits by 40+ contributors (in master alone) so community PRs are reviewed and merged. It can take time but the only projects without open PRs are dead ones. Often old PRs are likewise long and complicated and these are correlated for a reason. For sustainability it's important to keep code not only correct and efficient, but to keep as little code around as possible.

bhack commented 9 years ago

@shelhamer You can reproduce raw data if you are interested. IMHO PRs merging deadline are very slow for an "active project" and things was going slower in last months. From the last graph I also see an high fork/"contrib back" ratio. Generally it is not a good signal. Why people fork Caffe more with less PR? They do it for personal hosting nets experiments on github? Or whatever? They don't want invest time to contribute back code because get too time? You have also not commented the governance model you like to implement also if we want to modularize in contrib. A contrib without governance (or with only test coverage like opencv-contrib) will become quickly a trash sandbox. If we need to offload core members reviews "on contrib" who can we trust?

bhack commented 9 years ago

Please take a look also to this paper or more generally on some pull request ecosystem studies in this list

bhack commented 9 years ago

@Nerei I have opened a similar issue on opencv-contrib because it has similar problems (and i know well because my team have mentored different opencv gsoc projects and trying to maintain tracking api) https://github.com/Itseez/opencv_contrib/issues/149

bhack commented 9 years ago

Other the zenhub.io core members could try also waffle that is open source.

bhack commented 9 years ago

@longjon We need to do something to go ahead. New tickets are full of support requests noise. PR and really issues are lacking of BVLC activity. Only cross core members PRs are quite active and have rapid feedback and merge.

bhack commented 9 years ago

Updated stats at http://ghtorrent.org/pullreq-perf/BVLC-caffe/

shelhamer commented 7 years ago

Closing as development is diffusing more into the community through inviting contributors—whose help is very much appreciated—and delegation of community branches like OpenCL and Windows—which have been regularly improved by their leaders.