We should refactor the code and have tests

rbrito commented 9 years ago

Hi.

I have been learning about tests lately. I already implemented some tests on my own fork of edx-dl, but I guess that it is better to have just one codebase (even though this project doesn't work the way that I like it to, I guess that we can reach some common ground).

Back on the subject of tests, my other project, https://github.com/coursera-dl/coursera, has just switched from nose to py.test with the help of @meejah and @FedericoCeratto (read: "they basically did all the work"). I have learned a good amount of just by reading their code (and @meejah's code is very elegant, BTW) and this will surely stick with me.

I still have a lot to learn about py.test (and I have been swamped with real-life events), but I guess that this project could use a lot of help by incorporating things learned from that transition.

@shk3, @iemejia, any comments?

Regards,

Rogério Brito.

iemejia commented 9 years ago

Hi,

First, sorry to answer so late. I agree, The current state of the project is not the best, in particular the lack of tests makes the script too fragile. I really would like that we work in the automated tests, and that we can learn a bit from your experience with the tests of coursera-dl.

I really want this script to be better, but to be honest I really lost the will to contribute bigger changes (like a decent refactoring) since the last time I tried to do them. I know both of you had the best intentions for the progress of the project when you told me to stop contributing my changes and follow your proposed approach, but in that moment I had the impression that your intentions were good, but that the execution was not going to work, and today, almost a year later, nothing has improved and my intuition was right. Plans and good intentions are good, but working code in the project is better. So I ask again if it is not better to do the changes in a progressive and less 'formal' way (no branches, more work in the master, just break it). This can be less stable for a while, but in the end it can be better for the project, because that way we will have more eyes seeing what is happening, and a more open collaboration and testing (user-based but still better than nothing).

Concretely about the tests, there is another issue. if we want to write the unit tests, we need to have an API to test, so this is another question, how are we going to agree about it, or can you just begin to move your changes into the master without breaking it. In that case I agree to collaborate actively in any needed task.

One last thing that you can disagree is that I really like the fact that this script doesn't have any extra dependencies (apart of youtube-dl and bs) and I wouldn't like to add any other one that it is not worth (with probably the sole exception of a test framework). I say this because I saw in your fork that you include some of them, but I wouldn't like to add more complexity at least until we have a semi decent stable version of the project with the tests.

I really hope that we can collaborate this time, so just tell me what can I/we do.

Best, -Ismael

iemejia commented 9 years ago

Oh and I forgot to ask, what about you @shk3, are you still active in this repository, it has been some time since I don't have news from you. Are you willing to fix things or collaborate too ?

iemejia commented 9 years ago

I have been closing some of the invalid issues so we can have a more accurate issue tracking, as part of this.

rbrito commented 9 years ago

Hi.

On May 05 2015, Ismael Mejia wrote:

First, sorry to answer so late. I agree, The current state of the project is not the best, in particular the lack of tests makes the script too fragile.

Great.

I really would like that we work in the automated tests, and that we can learn a bit from your experience with the tests of coursera-dl.

Have in mind that I don't know that much about automated testing. I'm still learning and I still have a lot to learn. In fact, in a sense, I am using all these projects as testbeds for learning some practices that I never had any opportunities to experience. :)

For instance, one thing that I don't know is how to test things that need network connections without making the people from edX or coursera getting mad.

This is needed for the recent changes in HTTPS (and Python versions changing it recently) which are giving me a headache in coursera-dl.

I really want this script to be better, but to be honest I really lost the will to contribute bigger changes (like a decent refactoring) since the last time I tried to do them. I know both of you had the best intentions for the progress of the project when you told me to stop contributing my changes and follow your proposed approach, but in that moment I had the impression that your intentions were good, but that the execution was not going to work, and today, almost a year later, nothing has improved and my intuition was right.

I agree. And I also value a lot your work. I guess that revisiting the situation, as we are doing right now, is the way to go.

Plans and good intentions are good, but working code in the project is better.

Definitely.

So I ask again if it is not better to do the changes in a progressive and less 'formal' way (no branches, more work in the master, just break it). This can be less stable for a while, but in the end it can be better for the project, because that way we will have more eyes seen what is happening, and a more open collaboration and testing (user-based but still better than nothing).

I guess that we can go with your changes. From my side, I have a few requests:

If you pick any commit from any other project of mine, use a git cherry-pick or git format-patch and git am so that you preserve the notion that git has of authorship (it treats the Author: and Committer: fields separate). This is important to get due credit and to have my work recognized in statistics that employers may want to see (read: "funding").
Please, create small commits, for the sake of cherry-pick'ability and bisecting. Small granular changes are key to use git bisect and healthy to understand large changes.
Please, once you have a feature implemented, extract that thing and put it into a separate function. This makes things easier to implement unit testing.

Concretely about the tests, there is another issue. if we want to write the unit tests, we need to have an API to test, so this is another question, how are we going to agree about it, or can you just begin to move your changes into the master without breaking it. In that case I agree to collaborate actively in any needed task.

Ideally, I would love to just use as much of the same implementation as possible among different projects. For instance, we may want to steal the filename "normalizations" (for coping with filenames that have non-English characters and/or coping with file lengths). I already implemented this in a very bad manner in coursera-dl and it would be good to have edx-dl use this.

The best implementation of this filename normalization that I saw comes from the Picard project from musicbrainz. I hope to grab that and use it for coursera-dl.

I may be exposing my ignorance here (since there must really be a better way), but I wish that Python dealt more easily with this. In a similar situation, our hacky overriding of print is sad and should be retired.

Regarding testing, I would say, go ahead and just use Python's builtin unittest module. Converting to other modules is not that difficult after all. And, at least, we would have something to hook into a continuous integration thing (e.g., Travis CI) so that future changes are tested automatically at every push.

One last thing that you can disagree is that I really like the fact that this script doesn't have any extra dependencies (apart of youtube-dl) and I wouldn't like to add any other one that it is not worth (with probably the sole exception of a test framework).

No, I'm not opposed to keeping the program as self-contained as possible. And if we ever split the program into modules, we can always resort to the trick of creating an executable zip file.

I say this because I saw in your fork that you include some of them, but I wouldn't like to add more complexity at least until we have a semi decent stable version of the project with the tests.

I don't quite remember. I haven't looked at the code of my fork for quite some time. It has not been affected by various of the bugs that have been plaguing this project.

I really hope that we can collaborate this time, so just tell me what can I/we do.

As far as I am concerned, go ahead and let's get this thing in a better shape. :)

Regards,

Rogério Brito : rbrito@{ime.usp.br,gmail.com} : GPG key 4096R/BCFCAAAA http://cynic.cc/blog/ : github.com/rbrito : profiles.google.com/rbrito DebianQA: http://qa.debian.org/developer.php?login=rbrito%40ime.usp.br

rbrito commented 9 years ago

On May 06 2015, Ismael Mejia wrote:

Oh and I forget to ask, what about you @shk3, are you still active in this repository, it has been some time since I don't have news from you. Are you willing to fix things or collaborate too ?

Good question. I guess that we can, perhaps, move the code to an organization, so that the contributors come and go, but the project lives on.

Regards,

Rogério Brito : rbrito@{ime.usp.br,gmail.com} : GPG key 4096R/BCFCAAAA http://cynic.cc/blog/ : github.com/rbrito : profiles.google.com/rbrito DebianQA: http://qa.debian.org/developer.php?login=rbrito%40ime.usp.br

rbrito commented 9 years ago

On May 06 2015, Ismael Mejia wrote:

I have been closing some of the invalid issues so we can have a more accurate issue tracking, as part of this.

Excellent work, BTW.

Rogério Brito : rbrito@{ime.usp.br,gmail.com} : GPG key 4096R/BCFCAAAA http://cynic.cc/blog/ : github.com/rbrito : profiles.google.com/rbrito DebianQA: http://qa.debian.org/developer.php?login=rbrito%40ime.usp.br

iemejia commented 9 years ago

Hi, ok, it is nice to see that we are finally getting aligned. I will try to do cherry picking from your code where possible, but maybe it will be easier in some cases that you help me do the contribution, I say this because of the differences of your fork and the current version. Some things will be maybe easier if I just copy paste, but in that case it would be maybe better that we discuss the idea and you pass it from there to here. I will follow your other two recommendations.

I agree with Rogerio, I think that a good way to good credit to the people (and have less friction among contributors) is to pass it to an organization, how can we do this ? Do you know ?

One last thing, I assume that this discussion is the continuation of issue #71 so I am going to close that issue too. I am trying to keep the issue count low.

iemejia commented 9 years ago

One last thing, I am going to try to pick the best things I can from the pull requests, and to be more active with the active contributors.

FedericoCeratto commented 9 years ago

+1 on creating a GH organization: look for "new organization" in the pull down menu "+"

On May 8, 2015 3:44:45 PM PDT, Ismael Mejia notifications@github.com wrote:

Hi, ok, it is nice to see that we are finally getting aligned. I will try to do cherry picking from your code where possible, but maybe it will be easier in some cases that you help me do the contribution, I say this because of the differences of your fork and the current version. Some things will be maybe easier if I just copy paste, but in that case it would be maybe better that we discuss the idea and you pass it from there to here. I will follow your other two recommendations.

I agree with Rogerio, I think that a good way to good credit to the people (and have less friction among contributors) is to pass it to an organization, how can we do this ? Do you know ?

One last thing, I assume that this discussion is the continuation of issue #71 so I am going to close that issue too. I am trying to keep the issue count low.

Reply to this email directly or view it on GitHub: https://github.com/shk3/edx-downloader/issues/162#issuecomment-100387629

Sent from my Android device with K-9 Mail. Please excuse my brevity.

rbrito commented 9 years ago

On May 08 2015, Ismael Mejia wrote:

Hi, ok, it is nice to see that we are finally getting aligned.

Sure.

I will try to do cherry picking from your code where possible, but maybe it will be easier in some cases that you help me do the contribution, I say this because of the differences of your fork and the current version.

OK. I don't mind adapting the changes to this project.

Some things will be maybe easier if I just copy paste, but in that case it would be maybe better that we discuss the idea and you pass it from there to here. I will follow your other two recommendations.

Thanks. Just let me know the changes that you think that are appropriate for this project and I will try to create a branch with pull requests for review.

I agree with Rogerio, I think that a good way to good credit to the people (and have less friction among contributors) is to pass it to an organization, how can we do this ? Do you know ?

Yes, I have done that in the past with coursera-dl. I have an organization (coursera-dl). Perhaps we could rename that to mooc-dl or something else and have this project under that umbrella? I think that we may break some links, which is not that good.

On the other hand, in my experience, as long as the repositories are kept, all the stars, followers, issues on the issue tracker (this is important, as it is a very important knowledge base) and other things will be kept.

Then, it is just a matter of transferring the repository to some other person/organization. But this can only be made by the owner of the repository.

One last thing, I assume that this discussion is the continuation of issue

71 so I am going to close that issue too. I am trying to keep the issue

count low.

Yes, I also think that this is the "moral continuation" of that issue. Go ahead and close it. We have too many bugs already.

Rogério Brito : rbrito@{ime.usp.br,gmail.com} : GPG key 4096R/BCFCAAAA http://cynic.cc/blog/ : github.com/rbrito : profiles.google.com/rbrito DebianQA: http://qa.debian.org/developer.php?login=rbrito%40ime.usp.br

rbrito commented 9 years ago

On May 08 2015, Ismael Mejia wrote:

One last thing, I am going to try to pick the best things I can from the pull requests, and to be more active with the active contributors.

Excellent. Go ahead.

Rogério Brito : rbrito@{ime.usp.br,gmail.com} : GPG key 4096R/BCFCAAAA http://cynic.cc/blog/ : github.com/rbrito : profiles.google.com/rbrito DebianQA: http://qa.debian.org/developer.php?login=rbrito%40ime.usp.br

rbrito commented 9 years ago

On May 08 2015, Federico Ceratto wrote:

+1 on creating a GH organization: look for "new organization" in the pull down menu "+"

As I mentioned on my previous message, perhaps we could have "one organization to rule them all". :)

Thanks,

Rogério Brito : rbrito@{ime.usp.br,gmail.com} : GPG key 4096R/BCFCAAAA http://cynic.cc/blog/ : github.com/rbrito : profiles.google.com/rbrito DebianQA: http://qa.debian.org/developer.php?login=rbrito%40ime.usp.br

iemejia commented 9 years ago

I really agree, putting the project in a mooc-dl umbrella is the best way to go, I also prefer the edx-dl name, in particular because it will help edx-dl get some of the visibility of coursera-dl (and contributors I hope). @shk3 do you agree to give the control to @rbrito so he can move it into the right organization ?

coursera-dl / edx-dl

We should refactor the code and have tests #162

71 so I am going to close that issue too. I am trying to keep the issue