Stop committing things and decide the vision of the project

coursera-dl / edx-dl

A simple tool to download video lectures from edx.org (and other openedx sites)

GNU Lesser General Public License v3.0

1.93k stars 639 forks source link

Stop committing things and decide the vision of the project #71

Closed rbrito closed 9 years ago

rbrito commented 10 years ago

I would like to, in a very similar fashion of what Guido van Rossum once asked of the Python developers, to suggest that we take a moratorium on new commits and, first, decide what we think and agree is the role of the project.

In particular, I would like to see many things addressed:

The current documentation sucks. Badly. And I'm not only referring to the English parts of it. I am not a native speaker, even though some people have sporadically told me that my written English is good enough.
The current code smells bad. In many ways:
- It, perhaps, tries too hard to handhold the users and doing good user interfaces is hard. Exceptionally hard. Especially without a graphical toolkit that would have already have solved most of the problems. And while not looking good, Python has already a dependency on Tk, which is satisfied by all Python installations, barring users that explicitly avoid tkinter. The non-interactive branch that I created was supposed to cut the silly, unsafe text UI which comes with the master branch, besides having more modularity (more about this on the next points).
- On a more subjective note, the code clearly looks like the people writing it are amateur programmers. Well, I should not have said this, because I am also an amateur programmer, but each programming language has its own set of idioms and the current code follows none, differently from what I tried to accomplish on my non-interactive branch.
- The current master branch, at least the last time I checked, didn't support sites other than edx.org. My branch supports edX-based sites in general and I have, personally, used it with Stanford's site (just for tests), 10gen/Mongodb.com (for real courses, where I completed 4 courses with certificates) and edx.org (just for tests, as I am mostly completing some coursera courses).
- The current master branch has a lot of technical debt, which is something which I plainly acknowledge in my branch, with clear FIXME's or XXX's. This makes it easy for other people to jump in and see that the code needs improvement in a clear way. Unfortunately, such visibility is hindered by the fact that the code is in a non-default branch and github gives almost no visibility to it. I can't stress how much I think that technical debt is something that I try to avoid, even if I am swamped with it.
- The current master branch has features that I don't have, but that's mostly because I thought that my branch would have received after @iemejia joined the project. My original intention with the code would be to use the time-tested practice of making a development branch, stop developing on the "stable" branch and, eventually, make the development branch the default branch. I guess that this was not communicated effectively by me. The rationale for this development model is made explicit in this post: http://nvie.com/posts/a-successful-git-branching-model/
- The code currently lacks real testing. This impacts us. Badly. Especially when the sites that we are scraping change in some unpredictable ways. We need to add hooks to travis-ci and to coveralls, just like I do with coursera-dl.
- I am not really sure if I believe that having an interactive way of doing things is so much appealing. Just to put things in perspective, in coursera-dl, where I have tried hard to make the community inclusive we have 1333 stars and 425 forks, which is, in some way, a measure of the success of the project---when I joined, the project had way fewer followers. With youtube-dl, there are 3296 stars and 703 forks. This edx-downloader project has 58 stars and 63 forks. Both coursera-dl and youtube-dl don't have an interactive mode. But they are successful projects.
- Let me rephrase the point above, to avoid misinterpretations: I am not saying that having a text UI is detrimental to the project. On the contrary. But being functional and flexible by far exceeds a toy that doesn't fullfill the necessities of the users (say, supporting more sites, being reliable, tested) or doesn't fullfill the ease with which developers can add/fix features. And, yes, this last point includes adding a proper interactive mode. Again, if we are serious about usability and having an ease-to-use program, we should give a serious thought to use either curses or a graphical interface with tkinter. Otherwise, what we have is a joke. And the user interface may be a very good learning exercise for those that have not yet programmed such things.
- Coupled with the point above, I think that we should try hard to make the program work like a library/python module. This makes testing easier, coverage analysis easier, static analysis easier, integration with other tools easier and, in fact, many other things easier.
- After working in a project where there is more than one person involved, I have reached the conclusion that it is very important to have every committer know about every other changes that other people make to the code. In a regular git setting, this could be accomplished via hooks that e-mail people the diffs being made, so that everybody can be up-to-date with the project. Apparently, with github, the way to make other people know of the changes that other people are working on is to send pull requests. I would propose, therefore, that we don't use direct commits to the project, unless we have a pull request. Otherwise, we may get conflicts and people not knowing where the code stands.
This is a subjective point, but some programs, when invoked with no parameters, start with an interactive mode. This is, perhaps, appealing to people used to Windows. Other programs, when invoked with no parameters, just spit information on how it should be used. This is, perhaps, the Unix-mindset manifesting itself. The first approach doesn't seem to allow (unless one adopts the use of configuration files or use of batch files/scripts) the specification of standard parameters. This is annoying to some.

Well, I guess that I have more to say, but it is 4am here and I should really go to bed.

/cc: @rbrito

shk3 commented 10 years ago

I agree that we only commit for the bugs but not new features to the master, which make the script unusable, so that we can keep the master stable. I am sorry that I did some commits beyond it as I did not realize those changes would become big changes. It is true that the master actually lacks of testing and has faults. I did not pay much attention on the README file, since I am neither a native English speaker.

As for the supporting for other websites, I agree that it is not difficult. I actually scheduled to add it into the master via command line arguments, but after I rewrote the argument parsing part, I realized that those changes were actually an important part of making non-interactive, and we have some different ideas on it, so I stopped the implementation.

I generally agree with your opinion. An interactive mode may not be really necessary. I will post further response or checkout non-interactive tomorrow, since it is also quite late in my timezone.

iemejia commented 10 years ago

Good idea, the discussion is needed, I think we agree in most of your points, and it's good to see that @shk3 finally is accepting that getting rid of the interactive functionality is a must.

Rogério, I understand your reasoning about creating the branch to achieve the non-interactive functionality but sadly few progress has been done by others apart of you in the last months (actually last commit in that branch is more than 3 months old).

The real problem is that the non-interactive branch (and all his brothers) became long lived and partially 'unmaintained'. Long lived branches have the problem of getting ostracized, and in our case we have the additional problem that the codebases differ quite a bit, and we don't have automated tests to verify that all is working. That's the reason why I argue that it's better to integrate all the functionality of the branches in small steps in the master, via pull-requests that don't break in principle the whole script functionality. Even if we have a partially unstable master for some weeks it promotes more participation as we have seen in the last two weeks who have more commits than the last 6 months (notice also that in this moment even with all the new changes nothing new is broken).

I agree with your rule about pull-requests, we have to avoid direct commits, because they prevent of valuable code evaluation from the other members, however we need also a clear rule for the case when nobody accepts or rejects a pull request, something like if nobody complains in a week you can automatically merge, or something like that, in order to achieve progress. Also commits that fix a reported user error must have a higher priority and be evaluated and integrated ASAP.

@shk3 The discussion of Rogério was not about not commiting to master but about not commiting directly without review, notice that the goal of doing pull requests is getting a review and validation to effectively integrate new functionalities, refactorings and bug fixes to the master. Restricting commits only to bug fixes will not help us to evolve the project as we wish. You can create branches or work apart in the end the important thing is to do the pull-request.

I'm going to summarize the points I think we all agree here:

We need to finish the non-interactive mode and remove the interactive mode.
The code requires automated tests
The code requires serious refactoring and a better API organization.
The project needs better development documentation (function purpose, preconditions, FIXME, debug info, etc).
We have to integrate the functionality from the branches in the project (e.g. stanford support, etc)
The project needs better end-user documentation (and some publicity too).
We need to get rid of all those unused branches (ok, this one is mine!).

My goal in this moment is to attack the first point via the minimal patches to achieve the same functionality of the interactive mode via command args and then to remove the interactive mode. This implies fixing some things in the API and doc, but not all. I think I can have this before the end of the weekend. Once this is ready, the next step is to work on the second point, the automated tests. I think that with the automated tests we can do the refactorings with more liberty, we also will be able to integrate the rest of the missing functionalities from the branches with less fear of breaking things out.

rbrito commented 10 years ago

Hi there.

On Jan 02 2014, George Monkey wrote:

I agree that we only commit for the bugs but not new features to the master, which make the script unusable, so that we can keep the master stable.

Great, thanks. That's the whole idea of branches, BTW. You have one stable branch, then you fork that to create a development and only, if necessary, backport fixes to the stable branch.

I am sorry that I did some commits beyond it as I did not realize those changes would become big changes.

That's one problem that could be solved partially with a good test suite.

It is true that the master actually lacks of testing and has faults. I did not pay much attention on the README file, since I am neither a native English speaker.

I can do that. I actually like writing documentation (well, that seems to be something that few developers seem to like).

As for the supporting for other websites, I agree that it is not difficult.

Indeed, I have just done that, and I am not like a supercoder.

I actually scheduled to add it into the master via command line arguments, but after I rewrote the argument parsing part, I realized that those changes were actually an important part of making non-interactive, and we have some different ideas on it, so I stopped the implementation.

One of the best ways that I can see for downloading many courses with just one command line is to put the authentication of the various sites in a netrc file (supposing that we are downloading from different sites, where the passwords may be different) and passing the URLs as arguments to the program.

I generally agree with your opinion. An interactive mode may not be really necessary.

It is "necessary", but only after we have something working well. And when we do something that is robust, we can learn about implementing something with tkinter, which would, at least, provide an "app" look (to the program).

We can also distribute the program via pypi once it is reasonably good, so people may get stable versions with all the dependencies via pip and be OK there.

Final users shouldn't be using fresh checkouts of a VCS repository, unless they are helping to find a bug that the developers can't reproduce and, here, git bisect helps a lot.

I will post further response or checkout non-interactive tomorrow, since it is also quite late in my timezone.

I realize that you may not be as familiar with git to see the contents of a given branch. To help you a little bit, you may want to use some of the following commands, after having checked out the non-interactive branch:

git log -p
git log -p --reverse
git log -p --reverse --no-merges
gitk --all

These commands will help you see the patches and changes that I've made there. And, in fact, whenever I clone a git repository, that's what I do, to get up to speed with a given project's history.

You may see that I have a large amount of repositories in my account and, without exception, I do just what I described to get the gist of the project.

That being said, I see that both of you are working (or will work) on reimplementing things that I already have worked on, which include:

Adding parsing of commands.
Supporting multiple sites.
Modularizing the code.
Adding a testing infrastructure.
Getting rid of code that sucks by refactoring.

Things that are on my radar include:

Writing more comprehensive documentation.
Adding a continuous integration system to guarantee that things don't break when changes are made.
Adding a code coverage work
Explicitly declare a license for the code (I prefer the LGPL3+, but any other will do).
Tag versions and care about milestones.
Include downloading of subtitles in conjunction with the videos.
Make releases from the tags, possibly publishing them to places like pypi (having a license is, I think, mandatory for publishing there and, in fact, if we don't have one, then we are not exactly a Free Software project).

As you can see, I have grand plans for the project, but I think that the way that you guys are doing it is slightly hostile (I don't believe that you had this intention, though) to the improvements and changes that I already did on the non-interactive branch and I was expecting help there.

It is not "my" branch by any means. It is just the place where I am experimenting things (a playground, if you will). The branch is simply of a the collective of everybody that contributes or uses it.

Another thing: it may not be apparent, but some people do care about having their name in the repositories, as that gets them visibility for potential employers. I first learned about this when reading one of the posts on http://felipec.wordpress.com/

OK, I will do some changes in the forked repository that I created (see https://github.com/rbrito/edx-dl/) and let me know if you like them.

There is more that I want to say, but let me keep this not so long, or it may become a book.

Regards,

Rogério Brito : rbrito@{ime.usp.br,gmail.com} : GPG key 4096R/BCFCAAAA http://cynic.cc/blog/ : github.com/rbrito : profiles.google.com/rbrito DebianQA: http://qa.debian.org/developer.php?login=rbrito%40ime.usp.br

shk3 commented 10 years ago

@rbrito, the purpose of my previous commits is to ensure that the script in master branch works. As you may have seen, there were tons of opened issues. The implementation of argparse in #56 is indeed not intentionally to rewrite your contribution. I will try to avoid the re-inventing like #56 happening. As for modularizing, I believe it is unavoidable if we want to fix the bugs, since most bugs are caused by lack of modularizing. I understand such revisions will cause merge difficulties for non-interactive branch in the future, so let's avoid the structure change in the future fixes for the master branch.

As for the pull requests, I actually have switched to the pull request method to submit new revisions after your suggestions. The problem this time is caused by our communicating, and I will try to make my commit messages more informative.

Let's only do necessary bug fixing for the master branch so that we can help work on the non-interactive branch. Do you agree, @rbrito and @iemejia? BTW, which one is the active non-interactive repository, non-interactive branch or @rbrito's forked repository? Where should we send pull requests to if we want to collaborate on the non-interactive?

iemejia commented 10 years ago

Doing only bug fixing in the master will stop users from contributing new features or refactorings which is something we also need, so I don't think is a good idea. Personally I don't think that if we return or move the current work of the master to the non-interactive branch or if we create a new branch from the current master or if we contribute to @rbrito's repo we are going to win a lot for two reasons:

The two codebases (master/non-interactive) differ quite a bit at this moment which would do the merging an additional effort (an not a trivial one, in particular because we will have to do the manual re-testing again).
Because we are almost there, if you agree on my last pull-request, and I finish the changes for the sections (week) selection we will have the complete functionality of edx-dl in a non-interactive mode. I agree that we will still be behind in some of the code and ideas of the non-interactive branch and Rogério's repository, but at least we will have edx-dl working for the end-user exactly as now (which is not the case otherwise). And the project users can contribute with the issue reports and fixes in case of problems which we won't have in any other scenario.

Those missing things can then be pushed via pull-requests in small chunks to the regular master so that everybody reviews them, and like this we avoid the risk that a situation like this happens again.

shk3 commented 10 years ago

Yes. As for users' contribution, we should accept them. I would suggest that we should try to integrate the master branch with everyone's efforts.

How about accepting your new features for now and then trying to collaborate on non-interactive to make it releasable as soon as possible? I have not tested both of the branches yet, but I think we should keep the same non-interactive behavior in the two codebases. If the non-interactive behavior or usage in your contribution and the one in non-interactive are different, we can discuss on it to decide which one is better. In this way, the non-interactive branch works as refactoring and giving multiple new features.

@rbrito, should we work on non-interactive branch to implement the basic features (the features supported by the current master branch) for now so that we can merge it to the master as soon as possible? In this way, we can focus more on refactoring.

To sum up,

Let's stop actively making new features to the master, but we do accept the features submitted by users and the features we have finished but not merged yet.
As for non-interactive branch, let's focus more on refactoring and basic features. The new features have made should be accepted.
As for the overlapped new features, we should keep consistency of their behaviors. We can create issues to discuss on it.
As for the unimplemented new features, how about implementing those by pull requests? We can do them incrementally.

rbrito commented 10 years ago

Hi.

First of all, it is great that we are having this discussion, as it is a healthy way to let others know what are our hidden assumptions, especially when they don't match what the others think. I highly appreciate it.

On Jan 05 2014, Ismael Mejia wrote:

Doing only bug fixing in the master will stop users from contributing new features

That's the intention. New features should not go to a stable branch. A stable branch is meant only for bugfixes, especially when we want to refactor things. This refactoring work should be done in a different branch.

Let me guess: do you know about very successful projects like, say, the Linux kernel, where they provide a stable branch? What do you think is the reason why distributions like RHEL or Ubuntu LTS exist? They only include bugfixes, nothing else, once released (well, apart from support for new hardware, but no new functionality otherwise).

We should direct the contributors to work on a new branch. This way we:

keep the stable code stable.
don't have to do the work twice to accept it in the master branch and, then port the changes to the development branch.

or refactorings which is something we also need, so I don't think is a good idea.

The refactoring should not be done in the stable branch. The stable branch is stable. The refactoring can break things badly, especially since we don't even have one of the (almost) essential prerequisites for refactoring, which are comprehensive unit tests.

Personally I don't think that if we return or move the current work of the master to the non-interactive branch or if we create a new branch from the current master or if we contribute to @rbrito's repo we are going to win a lot for two reasons:

You are misguided here, IMVHO. No offense intended, but I would like to be proven incorrect.

The two codebases (master/non-interactive) differ quite a bit at this moment which would do the merging an additional effort (an not a trivial one, in particular because we will have to do the manual re-testing again).

The two codebases differ because I did a lot of work on that branch. They started exactly the same and only with small (mostly) self-contained changes is that I got where I got.

Because we are almost there, if you agree on my last pull-request, and I finish the changes for the sections (week) selection we will have the complete functionality of edx-dl in a non-interactive mode.

The pull request (I have not looked at it) should not be merged if it introduces new features. We should:

tag the current version as, say, v1.0
continue development on the non-interactive branch and, when we reach feature parity with the stable branch, we merge it to master and tag it v2.0.

IMVHO, we should keep conservative, bite the bullet with the current master branch not having the best practices and being limited and speed up the work on the non-interactive branch, so that this soup opera comes to an end.

I agree that we will still be behind in some of the code and ideas of the non-interactive branch and Rogerio's repository, but at least we will have edx-dl working for the end-user exactly as now (which is not the case otherwise).

Just fix the pressing bugs and obvious bugfixes and move to a feature branch (which is what the non-interactive is). In fact, the more that I think about it, the more that I disagree with the non-interactive name that I chose.

I'm laying the ground for a more model-view-controller (MVC) way of doing things, so that the interactive part can be implemented on top of it, via, say, curses or tkinter or whatever.

And the project users can contribute with the issue reports in case of problems which we won't have in any other scenario.

Again, users that want to contribute features should be kindly requested (e.g., via a CONTRIBUTING.md file) to base their changes on the development branch.

Those missing things can then be pushed via pull-requests in small chunks to the regular master so that everybody reviews it, and like this we avoid the risk that a situation like this happens again.

This is a once-in-a-lifetime situation, because the code had to be transitioned. Once we have a more solid foundation, we definitely can go back to a more piecemeal pull-requests, review

Regards,

Rog�rio Brito : rbrito@{mackenzie,ime.usp}.br : GPG key 1024D/7C2CAEB8 http://www.ime.usp.br/~rbrito : http://meusite.mackenzie.com.br/rbrito Projects: algorithms.berlios.de : lame.sf.net : vrms.alioth.debian.org

rbrito commented 10 years ago

On Jan 04 2014, George Monkey wrote:

Let's only do necessary bug fixing for the master branch

Perfect!

so that we can help work on the non-interactive branch.

Perfect!

BTW, which one is the active non-interactive repository, non-interactive branch or @rbrito's forked repository? Where should we send pull requests to if we want to collaborate on the non-interactive?

Any place would be OK with me. I think that now that I created the necessary infrastructure on my account to use travis-ci for the automatic unittests (and I will soon enable code coverage reports), then it would be more appreciated if you contributed there.

Once everything is in place, we can migrate everything back to your repo (or choose to create a GitHub organization or whatever).

Regards,

iemejia commented 10 years ago

Doing only bug fixing in the master will stop users from contributing new features

That's the intention. New features should not go to a stable branch. A stable branch is meant only for bugfixes, especially when we want to refactor things. This refactoring work should be done in a different branch.

Seriously ? That's the intention, to stop users from contributing ? Do you think this is the way to make the project succesful like youtube-dl (or whatever other 'succesful' project) ?

Do you seriously think that you will be able to tell a casual contributor to also go and understand a totally changed codebase and do the changes with another pull-request to the non-interactive branch, oh sorry, to one or some of the following: non-interactive, non-interactive-10gen, non-interactive-stanford, refactoring/take-00, refactoring/take-01, rbrito/edx-downloader + other bug-fixes branches ? I think in this scenario at least it is the responsability of the branch maintainer to update the changes from the master, not just to ask casual contributors to do it because in the end he will have to rebase this to master and ensure that everything works as before.

I have to tell you that I really dislike the tone of comments like 'Let me guess, do you know about very successful ...', you can't just argue like that. I have been respectful during the whole discussion. And for me it's not about I know more or less than anybody or I follow 'x' practice. My whole goal is that the project progresses.

I agree that refactoring in the master is dangerous, but the only way to tame this danger is with tests, and this can be done in two ways: with a whole refactoring in a branch + writing automated tests or in small steps to the master via small refactorings (which in fact needs more work since the master must not be broken but promotes more tests and collaboration). I was arguing for the second approach since it's been eleven months since the first approach started and nothing has been finished, and I saw more end-user results in two weeks than in those months. I was not arguing because I didn't understand it, or because I thought that the quality was better, or because it was a better solution. I'm arguing because the second scenario was working.

I think to avoid more misunderstandings, I will keep as I was until this moment, I will respect your rules, so I will only contribute from now on small bug-fixes or improvements in the master. Anyway if this whole process pushes you to finally finish the non-interactive branch that is awesome for all of us, because in the end you will have your beloved credit and I will have a better script to download the courses, isn't open-source awesome ?

rbrito commented 10 years ago

Hi there.

On Mon, Jan 6, 2014 at 6:17 AM, Ismael Mejia notifications@github.com wrote:

That's the intention. New features should not go to a stable branch. A stable branch is meant only for bugfixes, especially when we want to refactor things. This refactoring work should be done in a different branch.

Seriously ? That's the intention, to stop users from contributing ?

Yes, it is the intention to avoid new features (not bugfixes) when the branch is about to die.

Do you think this is the way to make the project succesful like youtube-dl (or whatever other 'succesful' project) ?

Certainly not. The difference there is that the changes are not structural, but only localized. The big change there was already performed when they split the script from one single file to multiple files and made youtube-dl more of a library, which is excellent for our purposes of not calling os.system or subprocess.call.

Furthermore, a lot of the churn that external contributors contribute to youtube-dl goes into new Information Extractors, and not in the core of the program. Adding a new IE is quite likely to not break the whole program.

This is the same thing as Linus Torvalds accepting new drivers late after the merge window has been closed.

Regarding other successful project, yes, they mostly work this way. You can look at any other big project and they have this thing called "feature freeze". See, for instance, the Linux kernel, Debian, Qt, Python, Ubuntu... Whatever you want to name.

Do you seriously think that you will be able to tell a casual contributor to also go and understand a totally changed codebase and do the changes with another pull-request to the non-interactive branch, oh sorry, to one or some of the following: non-interactive, non-interactive-10gen, non-interactive-stanford, refactoring/take-00, refactoring/take-01, rbrito/edx-downloader + other bug-fixes branches ?

Most of these branches are superseeded by the non-interactive branch, which, from those branches that you listed, are the "branch to rule them all".

I think in this scenario at least it is the responsability of the branch maintainer to update the changes from the master, not just to ask casual contributors to do it because in the end he will have to rebase this to master and ensure that everything works as before.

In that situation, the best thing is for the project managers to hold the pull request (again, unless it is a bugfix) and to port it over to the future branch.

I have to tell you that I really dislike the tone of comments like 'Let me guess, do you know about very successful ...', you can't just argue like that.

Did I use that? That's not my intention. I apologize. Publically.

My intention is to get help as soon as possible regarding the non-interactive branch so that it has feature parity with the interactive one, rename the master branch stable-1.0, tag the commit that we feel is representative of the best of the current master branch, rename the non-interactive branch to master.

Once we are satisfied with it, tag a version 2.0, with all the goodies that that will bring (unittests, code coverage, support for multiple sites, authentication via netrc, multiple modules, docstrings galore, updated documentation with the unified way of using the script, argument parsing from the command line, use of logging to debug support for users etc.).

I have been respectful during the whole discussion.

I agree. I just don't agree with the methodology. That's all. I appreciate your approach to the discussion and, again, it is helpful to have these discussions, since they bring to the table the hidden assumptions that we have regarding the vision of the project.

And for me it's not about I know more or less than anybody or I follow 'x' practice. My whole goal is that the project progresses.

I am interested in getting the project progressing and following best practices, since I may abandon the project (not likely, but I may die or become very ill).

I agree that refactoring in the master is dangerous,

Good.

but the only way to tame this danger is with tests,

Which we already have in the branch that I started. You mentioned that you would help me there. I'd love to not have to carry the torch alone.

and this can be done in two ways: with a whole refactoring in a branch + writing automated tests or in small steps to the master via small refactorings (which in fact needs more work since the master must not be broken but promotes more tests and collaboration).

Two things that I learned with people wiser than me:

commit early, commit often, with small, self-contained commits addressing only one point.
each and every commit, must, absolutely, at all times be workable, so that we can be confident with the changes and we gain the benefit of doing git bisect at any time and falling into a commit that works for finding regressions in the code.

Of course, the second rule above may be prone to human errors or oversights. But it is a splendid guideline.

I was arguing for the second approach since it's been eleven months since the first approach started and nothing has been finished, and I sadly I saw more end-user results in two weeks than in those months.

Let me tell you some of the reasons why the non-interactive banch was mostly dormant:

I had a person in my family that I had to care about that had very frequent respiratory problems.
I had another family member that had a stroke, that had Septicemia.
I myself had some health problems.
The person with septicemia and another member of the family died at the end of this year. We had a hell of a time in the real-life, including bureaucracy with paperwork and funerals.
I was also studying and I had completed 12 courses (some trivial, some not) on coursera and 4 courses on 10gen/mongodb (for a total of 16 courses, counting only those that grant certificates). For the 10gen courses, my branch was good enough and the master branch didn't meet my needs.
I was also enrolled in other courses, but my lack of time and the real-life events that I listed above all sucked my time. I hope to resume them when I have some time.
I am involved also in other projects and, for, say, the mongodb courses, I performed a bunch of face-lift changes in the package in Debian, as you can see here:

http://cynic.cc/blog/posts/debian_activities/ https://github.com/rbrito/mongo-debian/commits?author=rbrito
You also told me that you would work in the non-interactive branch and I didn't feel that a moving target would be good.

But, again, the most pressing needs were those related to my family.

I was not arguing because I didn't understand it, or because I thought that the quality was better, or because it was a better solution. I'm arguing because the second scenario was working.

I see. You were driven by the short-term benefits. I also believe in gradual, organic changes. But there are some circumstances where the better thing is to be bolder.

I think to avoid more misunderstandings, I will keep as I was until this moment, I will respect your rules, so I will only contribute from now on small bug-fixes or improvements in the master.

I am not the onwer of this project. You don't have to respect "my" rules. My message is just to put order in the chaos of commits. We should definitely not invest time in reinventing the wheel (like rewriting the argument parsing from the command line).

I would, for instance, love it if you sent me a patch to fix any of the FIXME annotations that put in my code. Also, I would love it if you ported the downloading of subtitles to the non-interactive branch.

And, of course, you guys can just ignore me and continue with the project the way that you see fit. I am just writing with respect of things in which I firmly believe that are good practices and that we should change our modus operandi. (Even if only momentarily).

Anyway if this whole process pushes you to finally finish the non-interactive branch that is awesome for all of us, because in the end you will have your beloved credit

I don't know if this will make me finish things faster. Another heavy tide of commitments is coming for me and I doubt that I will have a lot of spare time to work on the project. Help is definitely appreciated.

and I will have a better script to download the courses, isn't open-source awesome ?

Yes, it is. Especially when we can get together and agree on the best course for a project.

Let me be clear here. If you guys want, just ignore me and I will not make any interventions on your practices. I will continue implementing what I think is some better approaches and I will steal good ideas from your project, adapting those to my fork (if you happen to ignore my approach of keeping the master branch stable, receiving only bugfixes).

And you are free to steal ideas from me too (provided that we have compatible licences).

Of course, it would be better to have 3 people working together in 1 project than having 2 weaker projects wasting time duplicating what each other is doing.

So, in essence, take the work that I have already done. It is semi-ready. Reimplementing what I did seems like a terrible sign of a "Not Invented Here" syndromme (see the argparse reimplementation, a potential implementation for supporting multiple sites, the unittests etc.). I did a good part of that already.

We can finish that sooner if we all embrace it and this discussion will stop once we know what the visions of the other members.

Life is too short (and we are all too fragile) to spend time with these things, when they are almost already done and (IMVHO) in a superior way. Please, feel free to criticize my code (besides the admittedly sucky FIXMEs). I would love to improve my python-fu.

Sincerely yours,

Rogério Theodoro de Brito.

P.S.: If it is not clear, I want you to join me, @iemejia.

shk3 commented 10 years ago

It's fine. We are all doing volunteering work for this project. As for the stable branch, I would suggest that we have a separated branch (or maybe tag) for the stable version. The master branch can be left for developing based on our and users' contributions.

I have not got through all you guys' replies yet, since I am going to start my new semester and I just check them out before going to class.

iemejia commented 10 years ago

Hello,

I will not answer to all your points since I agree with most of them, and that's what makes this debate partially as absurd as it is (also considering that if we have put all the efforts of this conversation into making code or at least documentation for the project it probably would be better). But I still have some comments:

Yes, it is the intention to avoid new features (not bugfixes) when the branch is about to die.

Well, my point here is that the branch has been about to die for eleven months, and again this is not and never will be an attack to you, we are all to blame for this, it's not only your fault or anybody (including me), we are free here and nobody is paid, but it is hard to me to accept that we establish rules that in the end try to keep active this bad precedent.

The question is 'What is better?', a nice branch that doesn't have collaboration and has been stopped for months with the good will of having a better version OR bigger collaboration with smaller features that can be immediately used ? It depends of the perspective, of course for the maintainer the first one is more important, for the end-user no, he doesn't care, he cares about the end product. And I'm always in the side of the end-user (even as a maintainer).

Which we already have in the branch that I started. You mentioned that you would help me there. I'd love to not have to carry the torch alone.

I agree, but you also didn't provide a lot of encouragement for me to do so, You can't just expect that if you give commit access to someone he's going to come and make everything. I checked the branch immediately when you told me, and I saw it was quite broken, even basic things like the command line options weren't displaying as I expected, of course I understand that the goal was to fix many of those, but when I saw most of the code rewritten, the lack of updates and some of it not working versus the actual master my motivation went down immediately, in particular since the crappy script of the master could do most of what I wanted as an user in the end. Sorry I am lazy but I'm honest too.

One thing I never understood during this conversation is why if you knew about the fact that smaller improvements are better why you decided to mostly rewrite the whole thing instead of contributing it as small changes to the master, and why you felt so affected because we started to do it that way, you can also do it in that way (even now), an it's not reinventing anything, since most of the ideas come from your code and are yours. But well, we better forget this since this side discussion probably would go nowhere.

I would, for instance, love it if you sent me a patch to fix any of the FIXME annotations that put in my code. Also, I would love it if you ported the downloading of subtitles to the non-interactive branch.

Of course, it would be better to have 3 people working together in 1 project than having 2 weaker projects wasting time duplicating what each other is doing.

I agree with you about working together that's why I defend collaboration fiercely in every scenario, I will agree to make all the changes you want and start collaborating actively in the non-interactive branch with the following conditions:

We agree that the existing 'shk3/non-interactive' branch is the main place for collaboration. No more forks or places.
We get rid of all the other branches in the shk3 repo, I know is your repo @shk3, but those are really confusing, we can save that code in other places, even better if we migrate them to the non-interactive one, but we must not keep them like they are.
The immediate goal is to promote the non-interactive to master ASAP, and that includes the actual set of functionalities and nothing else. All the other features (e.g. other sites, tests, additional refactorings, plus other niceties) will come after in the form of pull-requests.
We will promote external collaboration including new features, what means that accepted pull-requests from casual contributors are welcome, and they won't be obliged to push things into the non-interactive or any other development branch while they exist, It's the branch maintainer(s) who are responsable for keeping it rebased and updated.

If you agree with those I will start to migrate the things that are not updated immediately, including subtitles, and others, and then I will check the FIXME's.

shk3 commented 10 years ago

Shall we create a stable tag and switch the main displayed branch to it? Therefore, we can keep accepting new features on the master branch, but keep a stable version for users.

bat-ventzi commented 10 years ago

I had a problem with the script. And even though I am no Python programmer I managed to fix it myself for my box. I wanted to help you guys and I lose some of my time to learn how to commit my fix. Even if my fix isn't a fix for all, it can tell you something about the problem.

I am saying that because I, as a user which, have some experience in programming, will contribute to the project if and only if (and this won't always happen) I have a problem and I managed to fix it alone. In the other case I must be very committed to the project to contribute something.

Yes, there must be a vision of the project, but you mustn't stop people from contributing. If people can't contribute to the master (stable) branch, most of them will not contribute at all, because the code will be different from what they have. So you have to think do you want more contributors and messier master branch or fewer contributors and tidy code with reasonable unit testing.

When I started writing this post I was thinking that @iemejia is right and more contributors is always better, but ... now I think I will take the other approach where the code is tidier, if I was the maintainer. And just to remind you that the second approach presses you, the committed ones, to do more work in less time.

shk3 commented 10 years ago

@bat-ventzi, thanks for your feedback for this project. I think the collaborators have met a consensus that we should admit contributions from users. Personally, I believe this issue is caused by our lack of necessary communications. We are working on making this project better.

The collaborators here are actually discussing on our current focusing, since we have a branch staying uncompleted for a long while, which has made more and more overhead of its maintenance. We are trying to finish it as soon as possible to avoid such overhead. We will definitely neither stop updating the project nor refuse contributions from users.

Thanks again for your participation of this project.

iemejia commented 10 years ago

Thanks @bat-ventzi for your opinion, and for adding a bit of new contributor rationality into the discussion. Your story about how you learned things just to contribute a fix is probably the same as mine, I'm not an everyday python programmer, but I have learned quite a bit thanks to my little contributions to small projects like this one. But I would have never continued to contribute if I have found many early barriers to do so in the beginning. Think about the logic, I'm investing a good amount of my time for free, and they put me a lot of barriers to do so, does it make sense ?, in particular for a casual contributor ?

I agree with your thinking, which means that I agree with Rogério and with everybody about how important are good software practices in the perspective of maintenance. And you would notice that I don't protest those, The argument I have been trying to state clear is that we must not lose rationality and pragmatism from the discussion. We must go in the direction that prevents us from falling into these errors:

Extra-Perfectionism: Thinking that an unfinished software solution with more features or more polished engineering beats one that it's crappy but at least works for people. Of course in the end the best product would probably beat the other, but my point is that functionality matters, Most people care about downloading the courses in a nice way more than about having a perfect script that covers all possible moocs, or about one that has beautiful unit-tests. We must not forget this.
Irrational expectations of contributors: I agree that every project should have a clear set of rules that must be respected by contributors. But this project in this moment lacks contributors and the less barriers we have, the better. If I think as a contributor that this project is kinda crappy, but I work for free to fix it, and they put me lots of barriers or conditions to do so, and additionally I don't see a lot of collaboration, does it make sense to contribute ?

When I talked about putting some conditions (some msgs ago) I didn't do it arbitrarily, or to win any argument war or something like that. I did it because apart of stating many points that weren't clear and that we mostly agree about now, we have not achieved consensus about how to avoid these errors in the future, and I think these errors were the deep reasons for the lack of progress in the non-interactive branch (independently of personal and motivational ones).

Notice that two of the conditions I mentioned address exactly these points 'trying to work in the core functionality first, and to push the branch so people can use it (1), and promoting contribution (2)'. The missing two conditions are about organizational issues 'having a common working point, and ordering the branches mess'. I proposed the existing shk3/non-interactive branch to avoid forking, but I don't care if Rogério wants to fork and appropriate the project in which case we will work there (in the end he has the right since he has put probably more effort than all of us into building this branch), but I tried to prevent this to avoid losing the visibility the project has.

Hey Rogério, can we just agree, smoke the peace pipe and start working ?

rbrito commented 10 years ago

Hi.

On Wed, Jan 8, 2014 at 9:15 PM, Ismael Mejia notifications@github.com wrote:

I agree with your thinking, which means that I agree with Rogério and with everybody about how important are good software practices in the perspective of maintenance. (...)

Hey Rogério, can we just agree, smoke the peace pipe and start working?

Please, go ahead. Bug we need to have a clear vision of what the project is and what we want it for the future.

The problem with development branches still remain. What should we do? You mentioned that the changes that I made were (perhaps not with these words, but something which I understood to be) "disruptive" in the sense of too many changes, but I started hacking on a branch based on the master branch. I see that as an evolutionary, instead of revolutionary change.

Anyway, I have a sizeable amount of changes. Can you (@iemejia? @shk3? others?) help me with my fork, so that we can improve it, and merge it back?

Regards, Rogério.

P.S.: For many unfortunate reasons, I had to drastically reduce my amount of github activity---please, help me making the contributions

not suffer bitrot.

iemejia commented 9 years ago

I am closing this issue. A new discussion on the refactorings and the future of the project is now part of issue #162