coursera-dl / edx-dl

A simple tool to download video lectures from edx.org (and other openedx sites)
GNU Lesser General Public License v3.0
1.93k stars 638 forks source link

You can access 0 courses #192

Closed Gianfranco-Campana closed 9 years ago

Gianfranco-Campana commented 9 years ago

Login works fine, but no courses are found, while I have two courses accessible.

The BeutifulSoup output does not contains 'article' rows.

iemejia commented 9 years ago

Are you using the latest version ? Which platform ? edx ?

Gianfranco-Campana commented 9 years ago

Thanks for your answer, yes, it is edx platform.

I'm using the last version of beautifulsoup4, edx-downloader (from https://github.com/shk3/edx-downloader ), and youtube-dl.

I redirected the soup object on a file, and no rows with 'article' is present. Il 15/Mag/2015 23:17, "Ismael Mejia" notifications@github.com ha scritto:

Are you using the latest version ? Which platform ? edx ?

— Reply to this email directly or view it on GitHub https://github.com/shk3/edx-downloader/issues/192#issuecomment-102527663 .

Rispetta l’ambiente. Hai davvero bisogno di stampare questa mail?

iemejia commented 9 years ago

Can you please tell me the name of the courses you have registered, or put a screenshot of your dashboard webpage.

Gianfranco-Campana commented 9 years ago

I urge to download the MITx - 15.071x - Tha analytics Edge material,

thanks you !

[image: Immagine in linea 1]

2015-05-18 17:44 GMT+02:00 Ismael Mejia notifications@github.com:

Can you please tell me the name of the courses you have registered, or put a screenshot of your dashboard webpage.

— Reply to this email directly or view it on GitHub https://github.com/shk3/edx-downloader/issues/192#issuecomment-103108811 .

Rispetta l’ambiente. Hai davvero bisogno di stampare questa mail?

iemejia commented 9 years ago

The course is not open yet. It starts in june 2nd. https://www.edx.org/course/analytics-edge-mitx-15-071x-0 Your image is not there.

Gianfranco-Campana commented 9 years ago

The last run is not yet compete. I'm doing right now the final exam, and I can enter the course as you can see below:

The course is not open yet. It starts in june 2nd. https://www.edx.org/course/analytics-edge-mitx-15-071x-0 Your image is not there.

— Reply to this email directly or view it on GitHub https://github.com/shk3/edx-downloader/issues/192#issuecomment-103862237.

Rispetta l’ambiente. Hai davvero bisogno di stampare questa mail?

iemejia commented 9 years ago

Does it have a different URL to access ? because the one I sent you is not opened yet.

Gianfranco-Campana commented 9 years ago

This the url, copied from the "View Course" button:

https://courses.edx.org/courses/MITx/15.071x_2/1T2015/info Il 20/Mag/2015 20:01, "Ismael Mejia" notifications@github.com ha scritto:

Does it have a different URL to access ? because the one I sent you is not opened yet.

— Reply to this email directly or view it on GitHub https://github.com/shk3/edx-downloader/issues/192#issuecomment-103975542 .

Rispetta l’ambiente. Hai davvero bisogno di stampare questa mail?

iemejia commented 9 years ago

I saw this error in windows, are you by any chance using windows ? If no, can you confirm me the version of BeautifulSoup you are using.

iemejia commented 9 years ago

@Gianfranco-Campana you found a really interesting bug, it seems that BeautifulSoup can use any of three different parsers, the native python one, lxml or html5lib. http://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser

If you don't have installed lxml or html5lib the script breaks since BeautifulSoup does not recognize the 'article' tag. This is a new html5 tag. So to solve your problem you have to install one of the two: html5lib or lxml, you can just type 'pip install html5lib' and then execute the script and it will be fixed.

iemejia commented 9 years ago

@rbrito How can we solve this in the dependencies ? I tried both lxml and html5lib and both work, however html5lib seems more robust, but it introduces one extra unwanted requirement (six). What do you think ? I am starting to think that maybe we must take html5lib and rewrite the 'compat' layer into six, and add both dependencies now.

rbrito commented 9 years ago

Hi there.

On May 21 2015, Ismael Mejia wrote:

@Gianfranco-Campana you found a really interesting bug, it seems that BeautifulSoup can use any of three different parsers, the native python one, lxml or html5lib. http://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser

Let me chime in here. I had this problem with coursera-dl about 2 years ago. You can see it here:

https://github.com/coursera-dl/coursera/issues/143

In the end, I decided that it was just worth to use html5lib and be done with that. See:

https://github.com/coursera-dl/coursera/blob/master/coursera/coursera_dl.py#L63-L69

BTW, now that I'm talking about this, I will remove the legacy code that we have in coursera-dl to import beautifulsoup 3 (and beautifulsoup 4 + lxml).

Regards,

Rogério Brito : rbrito@{ime.usp.br,gmail.com} : GPG key 4096R/BCFCAAAA http://cynic.cc/blog/ : github.com/rbrito : profiles.google.com/rbrito DebianQA: http://qa.debian.org/developer.php?login=rbrito%40ime.usp.br

rbrito commented 9 years ago

On May 21 2015, Ismael Mejia wrote:

@rbrito How can we solve this in the dependencies ? I tried both lxml and html5lib and both work, however html5lib seems more robust, but it introduces one extra unwanted requirement (six). What do you think ?

This is easy: just add the line corresponding to the dependencies in the requirements.txt file and in the tox.ini file. Then, force the import of bs4 passing html5lib.

I am starting to think that maybe we must take html5lib and rewrite the 'compat' layer into six, and add both dependencies now.

I think that it is better to simply use something that is well tested. Anything that can be done with six can be done manually (of course, since they do it), but reinventing the wheel may lead to something fragile.

BTW, the code in coursera-dl contains some uses of six, but some things still had to be done manually (see the imports for the modules), but six has evolved since I started to use it.

I think that it (hopefully) has all the abstractions that we need there and removing those home-grown solutions with things that six may provide now (I hope) will make the code clearer.

Hope this helps,

Rogério Brito : rbrito@{ime.usp.br,gmail.com} : GPG key 4096R/BCFCAAAA http://cynic.cc/blog/ : github.com/rbrito : profiles.google.com/rbrito DebianQA: http://qa.debian.org/developer.php?login=rbrito%40ime.usp.br

Gianfranco-Campana commented 9 years ago

Ismael, I installed html5lib and everything seems works fine: I see the courses list, I select the course and then I select a unit (or all unit), then all videos seems processed correctly.

But after the "[info] Output directory: Downloaded", I cannot find the "Downloaded" folder.

Where is this folder created?

[image: Immagine in linea 1]

Gianfranco Campana

2015-05-21 23:59 GMT+02:00 Ismael Mejia notifications@github.com:

@Gianfranco-Campana https://github.com/Gianfranco-Campana you found a really interesting bug, it seems that BeautifulSoup can use any of three different parsers, the native python one, lxml or html5lib. http://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser

If you don't have installed lxml or html5lib the script breaks since BeautifulSoup does not recognize the tag. This is a new html5 tag. So to solve your problem you have to install one of the two: html5lib or lxml, you can just type 'pip install html5lib' and then execute the script and it will be fixed.

— Reply to this email directly or view it on GitHub https://github.com/shk3/edx-downloader/issues/192#issuecomment-104433775 .

Rispetta l’ambiente. Hai davvero bisogno di stampare questa mail?

Gianfranco-Campana commented 9 years ago

Version from Varun Batra work perfectly (Windows 8.1 pro 64 bit) - Python 2.7.

( https://github.com/VarunBatraIT/organized-edx-download/blob/master/oedx-dl.py )

Thank you very much !!!

Gianfranco Campana

2015-05-22 16:27 GMT+02:00 Gianfranco Campana < gianfranco.campana@gruppomegamark.it>:

Ismael, I installed html5lib and everything seems works fine: I see the courses list, I select the course and then I select a unit (or all unit), then all videos seems processed correctly.

But after the "[info] Output directory: Downloaded", I cannot find the "Downloaded" folder.

Where is this folder created?

[image: Immagine in linea 1]

Gianfranco Campana

2015-05-21 23:59 GMT+02:00 Ismael Mejia notifications@github.com:

@Gianfranco-Campana https://github.com/Gianfranco-Campana you found a really interesting bug, it seems that BeautifulSoup can use any of three different parsers, the native python one, lxml or html5lib. http://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser

If you don't have installed lxml or html5lib the script breaks since BeautifulSoup does not recognize the tag. This is a new html5 tag. So to solve your problem you have to install one of the two: html5lib or lxml, you can just type 'pip install html5lib' and then execute the script and it will be fixed.

— Reply to this email directly or view it on GitHub https://github.com/shk3/edx-downloader/issues/192#issuecomment-104433775 .

Rispetta l’ambiente. Hai davvero bisogno di stampare questa mail?

iemejia commented 9 years ago

Ok, his version is actually an outdated fork of ours (even if he does not say so). I submitted a pull request with support for separated folder sections in case you want to check, it must be integrated soon since is the only difference I see with his project.

Gianfranco-Campana commented 9 years ago

Well - a last thing: no subs downloaded with this version. ( a major problem for me)

Gianfranco Campana

2015-05-22 17:36 GMT+02:00 Ismael Mejia notifications@github.com:

Ok, his version is actually a fork of ours (even if he does not say so). I submitted a pull request with support for separated folder sections in case you want to check it must be integrated soon.

— Reply to this email directly or view it on GitHub https://github.com/shk3/edx-downloader/issues/192#issuecomment-104693404 .

Rispetta l’ambiente. Hai davvero bisogno di stampare questa mail?

rbrito commented 9 years ago

I also have this "Downloads" directory problem that @Gianfranco-Campana has. More details in a subsequent bug.

rbrito commented 9 years ago

Ouch! That's FREAKING maddening! That (Varun Batra) is stealing other people's properties without giving credit (if that was a declared fork, that would be OK, as it would be traceable to other (read: "our") people's work) and putting a license when we do not still have decided on a license!

I would have freakin' uploaded this thing to PyPI already if we already had @shk3's response on if he agrees with our LGPLv3+ movement (see issue #173). But since I have not heard back from him, I have not yet asserted that our license is indeed the LGPLv3+.

OTOH, this person not only grabbed our code, but put it there claiming that it is his code (https://github.com/VarunBatraIT/organized-edx-download/commit/eab289f83908c72da2ae4fe803e8894c4f8dee26) and also putting a license of his choice (https://github.com/VarunBatraIT/organized-edx-download/blob/master/LICENSE).

BTW, I like the MPL, but I really prefer a stronger copyleft license (in fact, if it were not for considering other people's abilities to use edx_dl as a module/library, I would have chosen the GPLv3+).

Frankly, the way that we currently are here, we are not yet a Free Software project (we need a Free Software License to really qualify as a Free Software project). But that person's work is based on our work that didn't even have a specified license and, thus, is not allowed to relicense our software.

I have no problems writing software that is not Free Software (see, e.g., my contributions to youtube-dl, which is in the public domain and that can be used even in proprietary software), but I have to concede that first.

As you can see, I really, really care about this licensing thing (and you may have noticed that I write Free Software with capitals instead of open source software). The fact that I agree with the DFSG's only a way to show what my motivations are.

I think that we should decide on our license and ask that person to:

I am frankly too annoyed to continue writing right now.

I would really, really love it if @shk3 could tell us which license he agrees to use with his share of code.

Regards,

Rogério Brito.

Gianfranco-Campana commented 9 years ago

@Rogério I give full credit and honor to your works, and understand your frustration.

Btw, Keep Working hard with this excellent job.

Good luck ;)

Gianfranco

2015-05-22 19:25 GMT+02:00 Rogério Brito notifications@github.com:

Ouch! That's FREAKING maddening! That (Varun Batra) is stealing other people's properties without giving credit (if that was a declared fork, that would be OK, as it would be traceable to other (read: "our") people's work) and putting a license when we do not still have decided on a license!

I would have freakin' uploaded this thing to PyPI already if we already had @shk3 https://github.com/shk3's response on if he agrees with our LGPLv3+ movement (see issue #173 https://github.com/shk3/edx-downloader/issues/173). But since I have not heard back from him, I have not yet asserted that our license is indeed the LGPLv3+.

OTOH, this person not only grabbed our code, but put it there claiming that it is his code (VarunBatraIT/organized-edx-download@eab289f https://github.com/VarunBatraIT/organized-edx-download/commit/eab289f83908c72da2ae4fe803e8894c4f8dee26) and also putting a license of his choice ( https://github.com/VarunBatraIT/organized-edx-download/blob/master/LICENSE ).

BTW, I like the MPL, but I really prefer a stronger copyleft license (in fact, if it were not for considering other people's abilities to use edx_dl as a module/library, I would have chosen the GPLv3+).

Frankly, the way that we currently are here, we are not yet a Free Software project (we need a Free Software License to really qualify as a Free Software project). But that person's work is based on our work that didn't even have a specified license and, thus, is not allowed to relicense our software.

I have no problems writing software that is not Free Software (see, e.g., my contributions to youtube-dl, which is in the public domain and that can be used even in proprietary software), but I have to concede that first.

As you can see, I really, really care about this licensing thing (and you may have noticed that I write Free Software with capitals instead of open source software). The fact that I agree with the DFSG's only a way to show what my motivations are.

I think that we should decide on our license and ask that person to:

  • state publicly that the code was taken from our repository with his name slapped onto the program without too much thought (he even lists that in his CV: http://resume.varunbatra.com/ and it is quite likely that recruiters would believe him when he says that he wrote that thing).
  • ask that he contributes back all the changes to this project (which would have been the decent thing to do), with the proviso that his changes are under the license that he chooses and that may not be compatible with ours.
  • ask people from github to remove that project (which will matter very little, since he can simply host "his project" anywhere else).

I am frankly too annoyed to continue writing right now.

I would really, really love it if @shk3 https://github.com/shk3 could tell us which license he agrees to use with his share of code.

Regards,

Rogério Brito.

— Reply to this email directly or view it on GitHub https://github.com/shk3/edx-downloader/issues/192#issuecomment-104720504 .

Rispetta l’ambiente. Hai davvero bisogno di stampare questa mail?

Gianfranco-Campana commented 9 years ago

After some tests I realized I had made a mistake: the script from shk3 works fine, even with subtitles.

Therefore, please consider this report closed.

Thank you for the excellent work!

Gianfranco Campana Uff. ICT - Gruppomegamark https://www.linkedin.com/profile/view?id=272029401&trk=nav_responsive_tab_profile_pic

2015-05-22 21:30 GMT+02:00 Gianfranco Campana < gianfranco.campana@gruppomegamark.it>:

@Rogério I give full credit and honor to your works, and understand your frustration.

Btw, Keep Working hard with this excellent job.

Good luck ;)

Gianfranco

2015-05-22 19:25 GMT+02:00 Rogério Brito notifications@github.com:

Ouch! That's FREAKING maddening! That (Varun Batra) is stealing other people's properties without giving credit (if that was a declared fork, that would be OK, as it would be traceable to other (read: "our") people's work) and putting a license when we do not still have decided on a license!

I would have freakin' uploaded this thing to PyPI already if we already had @shk3 https://github.com/shk3's response on if he agrees with our LGPLv3+ movement (see issue #173 https://github.com/shk3/edx-downloader/issues/173). But since I have not heard back from him, I have not yet asserted that our license is indeed the LGPLv3+.

OTOH, this person not only grabbed our code, but put it there claiming that it is his code (VarunBatraIT/organized-edx-download@eab289f https://github.com/VarunBatraIT/organized-edx-download/commit/eab289f83908c72da2ae4fe803e8894c4f8dee26) and also putting a license of his choice ( https://github.com/VarunBatraIT/organized-edx-download/blob/master/LICENSE ).

BTW, I like the MPL, but I really prefer a stronger copyleft license (in fact, if it were not for considering other people's abilities to use edx_dl as a module/library, I would have chosen the GPLv3+).

Frankly, the way that we currently are here, we are not yet a Free Software project (we need a Free Software License to really qualify as a Free Software project). But that person's work is based on our work that didn't even have a specified license and, thus, is not allowed to relicense our software.

I have no problems writing software that is not Free Software (see, e.g., my contributions to youtube-dl, which is in the public domain and that can be used even in proprietary software), but I have to concede that first.

As you can see, I really, really care about this licensing thing (and you may have noticed that I write Free Software with capitals instead of open source software). The fact that I agree with the DFSG's only a way to show what my motivations are.

I think that we should decide on our license and ask that person to:

  • state publicly that the code was taken from our repository with his name slapped onto the program without too much thought (he even lists that in his CV: http://resume.varunbatra.com/ and it is quite likely that recruiters would believe him when he says that he wrote that thing).
  • ask that he contributes back all the changes to this project (which would have been the decent thing to do), with the proviso that his changes are under the license that he chooses and that may not be compatible with ours.
  • ask people from github to remove that project (which will matter very little, since he can simply host "his project" anywhere else).

I am frankly too annoyed to continue writing right now.

I would really, really love it if @shk3 https://github.com/shk3 could tell us which license he agrees to use with his share of code.

Regards,

Rogério Brito.

— Reply to this email directly or view it on GitHub https://github.com/shk3/edx-downloader/issues/192#issuecomment-104720504 .

Rispetta l’ambiente. Hai davvero bisogno di stampare questa mail?