coursera-dl / edx-dl

A simple tool to download video lectures from edx.org (and other openedx sites)
GNU Lesser General Public License v3.0
1.93k stars 638 forks source link

edx-dl bombs when downloading MIT's "The Analytics Edge" #220

Closed rbrito closed 9 years ago

rbrito commented 9 years ago

The course URL is: https://courses.edx.org/courses/MITx/15.071x/1T2014/info

I took a brief look (sorry, way past bed time here---it's more than 1AM) and it seems that the HTML of the dashboard is all messed up (don't know why), but we should not bomb with an unfriendly stack trace.

This is what I get, BTW:

Traceback (most recent call last):
  File "edx-dl.py", line 6, in <module>
    edx_dl.main()
  File "/home/rbrito/Desktop/cursos/edx-downloader/edx_dl/edx_dl.py", line 630, in main
    for selected_course in selected_courses}
  File "/home/rbrito/Desktop/cursos/edx-downloader/edx_dl/edx_dl.py", line 630, in <dictcomp>
    for selected_course in selected_courses}
  File "/home/rbrito/Desktop/cursos/edx-downloader/edx_dl/edx_dl.py", line 198, in get_available_sections
    for i, section_soup in enumerate(sections_soup, 1)]
  File "/home/rbrito/Desktop/cursos/edx-downloader/edx_dl/edx_dl.py", line 176, in _make_url
    return BASE_URL + section_soup.ul.find('a')['href']
TypeError: 'NoneType' object has no attribute '__getitem__'

Thanks,

Rogério Brito.

iemejia commented 9 years ago

In my dashboard this course has not started (so it is not possible to download it), does it appear in your list when you do --course-list ?

iemejia commented 9 years ago

I agree however that we could deal better with the exception where the url is not a valid course.

rbrito commented 9 years ago

On May 31 2015, Ismael Mejia wrote:

In my dashboard this course has not started (so it is not possible to download it), does it appear in your list when you do --course-list ?

Sure, it is an old course that has already ended (I would not file a bug otherwise). ;) I will post a screenshot after I send this e-mail.

Thanks,

Rogério Brito : rbrito@{ime.usp.br,gmail.com} : GPG key 4096R/BCFCAAAA http://cynic.cc/blog/ : github.com/rbrito : profiles.google.com/rbrito DebianQA: http://qa.debian.org/developer.php?login=rbrito%40ime.usp.br

rbrito commented 9 years ago

Here's the screnshot:

screenshot-15 071x courseware edx - iceweasel-1

iemejia commented 9 years ago

Yes, the thing is I tried to access it doesn't let me since I was not registered in the previous version of the course, so I can't check for any error. However I wouldn't be suprised if that crazy

in the subsection title is breaking the parser.

rbrito commented 9 years ago

Hi there.

On Jun 02 2015, Ismael Mejia wrote:

Yes, the thing is I tried to access it doesn't let me since I was not registered in the previous version of the course, so I can't check for any error.

Hummm, it seems that edX started to adopt this measure of closing the courses. I wanted to enroll in one course myself and I also saw what you described. :(

Regarding the course being re-offered, I didn't know that. Thanks for the hint. :) I will enroll in this next session.

However I wouldn't be suprised if that crazy

in the subsection title is breaking the parser.

That's what I suspect too. I may or may not get to deal with that if I find the time. Let's consider this as a lower priority bug (besides the fact that we should not bomb).

Thanks for the hint of the course being offered again,

Rogério Brito : rbrito@{ime.usp.br,gmail.com} : GPG key 4096R/BCFCAAAA http://cynic.cc/blog/ : github.com/rbrito : profiles.google.com/rbrito DebianQA: http://qa.debian.org/developer.php?login=rbrito%40ime.usp.br

bish4u commented 9 years ago

Gettiing this error (the course shows in my --list)

Traceback (most recent call last): File "edx-dl.py", line 6, in edx_dl.main() File "D:\software\edx-downloader-master\edx_dl\edx_dl.py", line 710, in main for selected_course in selected_courses} File "D:\software\edx-downloader-master\edx_dl\edx_dl.py", line 710, in <dictc omp> for selected_course in selected_courses} File "D:\software\edx-downloader-master\edx_dl\edx_dl.py", line 152, in get_av ailable_sections sections = extract_sections_from_html(page, BASE_URL) File "D:\software\edx-downloader-master\edx_dl\parsing.py", line 263, in extra ct_sections_from_html for i, section_soup in enumerate(sections_soup, 1)] File "D:\software\edx-downloader-master\edx_dl\parsing.py", line 263, in <list comp> for i, section_soup in enumerate(sections_soup, 1)] File "D:\software\edx-downloader-master\edx_dl\parsing.py", line 241, in _make _url return BASE_URL + section_soup.ul.find('a')['href'] TypeError: 'NoneType' object is not subscriptable

rbrito commented 9 years ago

I will try to check this later, but I think that @balta2ar's fix (https://github.com/coursera-dl/edx-dl/commit/9b1a504df622aea5113ea1da255ae562b1ee1fc3) may have fixed the issue that I saw.

@bish4u, can you confirm if this issue is fixed for you, please?

rbrito commented 9 years ago

Since we have not heard back from others, I am closing this bug.