coursera-dl / edx-dl

A simple tool to download video lectures from edx.org (and other openedx sites)
GNU Lesser General Public License v3.0
1.92k stars 638 forks source link

TypeError: 'NoneType' object is not subscriptable #595

Open MissGorgeousTech opened 4 years ago

MissGorgeousTech commented 4 years ago

Subject of the issue

when trying to download the course videos specifically one course (listed bellow), it gives the error TypeError: 'NoneType' object is not subscriptable. Tried with others and doesn't give errors and works fine.

Traceback (most recent call last): File "/usr/local/bin/edx-dl", line 11, in sys.exit(main()) File "/usr/local/lib/python3.6/dist-packages/edx_dl/edx_dl.py", line 1023, in main for selected_course in selected_courses} File "/usr/local/lib/python3.6/dist-packages/edx_dl/edx_dl.py", line 1023, in for selected_course in selected_courses} File "/usr/local/lib/python3.6/dist-packages/edx_dl/edx_dl.py", line 186, in get_available_sections sections = page_extractor.extract_sections_from_html(page, BASE_URL) File "/usr/local/lib/python3.6/dist-packages/edx_dl/parsing.py", line 403, in extract_sections_from_html for i, section_soup in enumerate(sections_soup, 1)] File "/usr/local/lib/python3.6/dist-packages/edx_dl/parsing.py", line 403, in for i, section_soup in enumerate(sections_soup, 1)] File "/usr/local/lib/python3.6/dist-packages/edx_dl/parsing.py", line 372, in _make_url return section_soup.a['href']

environment

Steps to reproduce

https://courses.edx.org/courses/course-v1:IBM+PY0101EN+1T2020/cou rse/

manutiedra commented 4 years ago

This also happens with https://courses.edx.org/courses/course-v1:GTx+HI2018xII+1T2019/course/

numlockkey commented 4 years ago

and with this: https://courses.edx.org/courses/course-v1:MITx+6.004.1x_3+3T2016/course/

tigerjoy commented 4 years ago

and with this: https://courses.edx.org/courses/course-v1:StanfordOnline+CSX0005+1T2020/course/

aprilchew commented 4 years ago

I get this fixed by changing line 372 code in parsing.py. From 'return section_soup.a['href']' to 'return section_soup.ol'

numlockkey commented 4 years ago

aprilchew: Tried it, doesn't work.

aprilchew commented 4 years ago

aprilchew: Tried it, doesn't work.

Try section_soup.ol, remove the ['href']

tigerjoy commented 4 years ago

@aprilchew Thank you very much. It does indeed fix the issue.

For those who are still having trouble, here are the steps that you can follow.

  1. Clone or download as .zip https://github.com/coursera-dl/edx-dl
  2. Extract the .zip using "Extract Here" option.
  3. Navigate to the following folder edx-dl-master/edx_dl
  4. Open parsing.py with your favorite text editor that displays line numbers.
  5. Scroll down to line 372, and change return section_soup.a['href'] to return section_soup.ol

Here is the before and after Image for reference The commented line 372 shows the before, and the 373 line is the change.

  1. Go up a directory, inside edx-dl-master
  2. To download courses now, you must use the following: - python edx-dl.py -u user@user.com COURSE_URL

NOTE: If you have downloaded edx-dl using pip, the following steps won't work. To make it work you need to navigate to site-packages or dist-packages folder, find the edx-dl folder, look for parser.py and make the necessary changes as above.

EDIT: I've downloaded a few other courses as well, and this change has not yet broken any other downloads so far.

floviolleau commented 4 years ago

Hi,

A PR would be appreciated :)

Kind regards

dr-jeffrey commented 4 years ago

Tigerjoy solution worked for me. However, be careful and not create another line, I just replaced the original code.

ichit commented 4 years ago

Hello smart guys. is there no one available in github who is able to fix the problems of downloading tutorials sucessfully from Edx website?. I have tried since 2019 to use this script to download my tutorials from Edx and it only stops after displaying my course contains. For me its really a pain because i have courses i desperately needed offline which have expired and i am still learning to code and not experiened to help in solving the downloading problems. Thanks

Ankk98 commented 4 years ago

I can work on it, will send a pr soon. @ichit this is not the way you should be asking people to contribute. Be kind and respectful.

rbrito commented 4 years ago

@Ankk98, a pull request that closes this would be welcome. Again: the simpler (and cleaner) the code, the better (since it will ease maintenance in the future when things break again--and they will).

ichit commented 4 years ago

@Ankk98 I do not mean to disrespect anyone or speak rudely. I quite understand perfectly that no one get paid for their work on this platform. I mistakenly showed my frustration due to my inability to download a course i desperately need for my thesis. I apologize for to anyone who feels offended. I thank all persons who helps to make life easier for others.

floviolleau commented 4 years ago

Hi,

I faced to the issue on a course today. So I decided to do a PR...

Here is the PR on the table.

Anyone know who are the owner(s) of this project? I see lots of PRs pending merge to master.

If anyone can help here it will nice :) Maybe @rbrito?

Thanks

Bucky0789 commented 3 years ago

@aprilchew Thank you very much. It does indeed fix the issue.

For those who are still having trouble, here are the steps that you can follow.

1. Clone or download as .zip **https://github.com/coursera-dl/edx-dl**

2. Extract the .zip using **"Extract** Here" option.

3. Navigate to the following folder **edx-dl-master/edx_dl**

4. Open **parsing.py** with your favorite text editor that displays line numbers.

5. Scroll down to line 372, and change **return section_soup.a['href']** to **return section_soup.ol**

Here is the before and after Image for reference The commented line 372 shows the before, and the 373 line is the change.

1. Go up a directory, inside **edx-dl-master**

2. To download courses now, you must use the following: -
   `python edx-dl.py -u user@user.com COURSE_URL`

NOTE: If you have downloaded edx-dl using pip, the following steps won't work. To make it work you need to navigate to site-packages or dist-packages folder, find the edx-dl folder, look for parser.py and make the necessary changes as above.

EDIT: I've downloaded a few other courses as well, and this change has not yet broken any other downloads so far.

Hi, I have followed the whole procedure according to you but there has to be an empty folder created. what should I do next step? please suggest.

Bucky0789 commented 3 years ago

@tigerjoy

For those who are still having trouble, here are the steps that you can follow.

  1. Clone or download as .zip https://github.com/coursera-dl/edx-dl

  2. Extract the .zip using "Extract Here" option.

  3. Navigate to the following folder edx-dl-master/edx_dl

  4. Open parsing.py with your favorite text editor that displays line numbers.

  5. Scroll down to line 372, and change return section_soup.a['href'] to return section_soup.ol

Here is the before and after Image for reference The commented line 372 shows the before, and the 373 line is the change.

  1. Go up a directory, inside edx-dl-master

  2. To download courses now, you must use the following: - python edx-dl.py -u user@user.com COURSE_URL

NOTE: If you have downloaded edx-dl using pip, the following steps won't work. To make it work you need to navigate to site-packages or dist-packages folder, find the edx-dl folder, look for parser.py and make the necessary changes as above.

EDIT: I've downloaded a few other courses as well, and this change has not yet broken any other downloads so far.

Hi, I have followed the whole procedure according to you but there has to be an empty folder created. what should I do next step? please suggest.