coursera-dl / coursera-dl

Script for downloading Coursera.org videos and naming them.
GNU Lesser General Public License v3.0
9.37k stars 2.21k forks source link

Incomplete List of Files for startup-001 #137

Closed elimence closed 11 years ago

elimence commented 11 years ago

This is the command i run: python coursera-dl -n --path=/media/elimence/New\ Sector\ 7/ONLINE\ CLASSES/Coursera/Classes startup-001

This is what is available:

cour

But i only got up to the second video in lecture 2.

Here is the terminal output: coursera-dl

jonasdt commented 11 years ago

Works for me. Can you please install the dependencies by running

pip install -r requirements.txt

from the top level directory of the repo (if pip is not installed, here are some instructions). Let us know if this fixes the problem.

elimence commented 11 years ago

I already did that, but i run this command

sudo pip install --upgrade -r requirements.txt

and had this output

Requirement already up-to-date: argparse==1.2.1 in /usr/lib/python2.7 (from -r requirements.txt (line 1))
Requirement already up-to-date: beautifulsoup4==4.1.3 in /usr/local/lib/python2.7/dist-packages (from -r r    equirements.txt (line 2))
Requirement already up-to-date: nose==1.3.0 in /usr/local/lib/python2.7/dist-packages (from -r requirements.txt (line 3))
Requirement already up-to-date: requests==1.2.3 in /usr/local/lib/python2.7/dist-packages (from -r  requirements.txt (line 4))
Cleaning up...

However I still have the same problem. If i delete all the already downloaded course material and run the command again, it still stops at the same point.

jonasdt commented 11 years ago

Ok, were are going to do the same is in #134:

Try virtualenv to see if it is a dependency problem:

If this doesn't work, then in a new terminal window run pip list and paste the output in this thread. Next, try the following: replace line 57-66 in coursera/coursera_dl.py, i.e.

try:
    from BeautifulSoup import BeautifulSoup
except ImportError:
    from bs4 import BeautifulSoup as BeautifulSoup_
    # Use html5lib for parsing if available
    try:
        import html5lib
        BeautifulSoup = lambda page: BeautifulSoup_(page, 'html5lib')
    except ImportError:
        BeautifulSoup = BeautifulSoup_

with

from bs4 import BeautifulSoup

Try again. If it fails, replace line 495, i.e.

soup = BeautifulSoup(page)

with

soup = BeautifulSoup(page, 'html.parser')

If anything fails, paste the output below.

elimence commented 11 years ago

Hey! thanks, it worked like magic. I'm now able to download all files for the courses.

Thanks again!

jonasdt commented 11 years ago

Great! Can you let me know what step fixed the problem? Can you also check if you have lxml installed, it should show up in pip list (make sure the virtualenv is not activated, I want the global packages).

rbrito commented 11 years ago

Hi, @elimence.

On Wed, Jun 26, 2013 at 2:36 PM, elimence notifications@github.com wrote:

Hey! thanks, it worked like magic. I'm now able to download all files for the courses.

You seem to be using a Debian-derived distribution, right? (Ubuntu?)

Repeating what I said at

https://github.com/jplehmann/coursera/issues/134#issuecomment-20076923

Does having html5lib installed work unconditionally? If yes, then I will put that right there in the requirements.txt file and be done with this.

Also, as @jonasdt asked, can you summarize what specific step worked for you? Was the pip installation procedure or was editing the script? This is important to know, so that we can fix the script for good.

Thanks for your reports,

Rogério Brito : rbrito@{ime.usp.br,gmail.com} : GPG key 4096R/BCFCAAAA http://cynic.cc/blog/ : github.com/rbrito : profiles.google.com/rbrito DebianQA: http://qa.debian.org/developer.php?login=rbrito%40ime.usp.br

rbrito commented 11 years ago

@elimence,

Please, can you send us (say, via http://pastebin.com/) the HTML file of startup-001 so that we can include it in our tests and guarantee that future changes to the script won't break this course?

jonasdt commented 11 years ago

@rbrito @elimence I will add a test and the HTML file tonight :)

elimence commented 11 years ago

@jonasdt @rbrito First off, i followed the exact steps specified by jonasdt. i.e creating the virtual env and installing the requirements with pip. I didn't edit the scripts since the pip thing worked.

@rbrito you can see the exact steps about in the 4th post (by jonasdt) and yes i'm using Ubuntu 13.04

@jonasdt Here's the output of pip list (executed outside of the virtual environment)

adium-theme-ubuntu (0.3.3)
apt-xapian-index (0.45)
argparse (1.2.1)
beautifulsoup4 (4.1.3)
CDApplet (1.0)
CDBashApplet (1.0)
chardet (2.0.1)
compizconfig-python (0.9.9.0)
configglue (1.0)
configobj (4.7.2)
coursera-dl (1.4.8)
coverage (3.6)
debtagshw (0.1)
decorator (3.3.3)
defer (1.0.6)
dirspec (4.2.0)
distribute (0.6.34)
Django (1.4.5)
django-nose (1.1)
docutils (0.10)
duplicity (0.6.21)
gunicorn (0.17.4)
hiredis (0.1.1)
httplib2 (0.7.7)
ipython (0.13.2)
Jinja2 (2.7)
lockfile (0.8)
logilab-astng (0.24.3)
logilab-common (0.59.1)
lxml (3.1.0)
Mako (0.7.3)
MarkupSafe (0.15)
mechanize (0.2.5)
mock (1.0.1)
mysql-connector-python (0.3.2-devel)
MySQL-python (1.2.3)
mysql-utilities (1.0.3)
nose (1.3.0)
oauthlib (0.3.7)
oneconf (0.3.3)
PAM (0.4.2)
paramiko (1.7.7.1)
Paste (1.7.5.1)
pep8 (1.4.5)
pexpect (2.4)
Pillow (2.0.0)
piston-mini-client (0.7.5)
protobuf (2.4.1)
pyalsaaudio (0.5)
pycrypto (2.6)
pycups (1.9.62)
pycurl (7.19.0)
Pygments (1.6)
pygobject (3.8.0)
pygpgme (0.3)
pyinotify (0.9.3)
pylint (0.28.0)
pyOpenSSL (0.13)
pyserial (2.6)
pysmbc (1.0.13)
pysqlite (2.6.3)
python-apt (0.8.8ubuntu6)
python-daemon (1.5.5)
python-debian (0.1.21-nmu2ubuntu1)
python-gcm (0.1.4)
python-termstyle (0.1.10)
pyxdg (0.25)
redis (2.7.5)
rednose (0.3.3)
reportlab (2.6)
requests (1.2.3)
rhythmbox-ubuntuone (4.2.0)
selenium (2.33.0)
sessioninstaller (0.0.0)
simplegeneric (0.8.1)
simplejson (3.3.0)
six (1.2.0)
software-center-aptd-plugins (0.0.0)
Sphinx (1.2b1)
system-service (0.1.6)
thumbs-xblock (0.1)
Twisted-Core (12.3.0)
Twisted-Names (12.3.0)
Twisted-Web (12.3.0)
ubuntu-tweak (0.8.5)
ubuntuone-storage-protocol (4.2.0)
unity-lens-photos (0.9)
vboxapi (1.0)
virtualenv (1.9.1)
WebOb (1.2.3)
WSGIProxy (0.2.2)
wsgiref (0.1.2)
XBlock (0.1, /home/elimence/SANDBOX/EDX/XBlock)
XBlockWorkbench (0.1)
xdiagnose (3.5.1)
zenmap (6.00)
zope.interface (4.0.5)

Looks like i have it - lxml (3.1.0)

elimence commented 11 years ago

Here's the source on pastebin incase you still don't have it

http://pastebin.com/tkxwemdz
elimence commented 11 years ago

Let me know if i have left out anything.

jonasdt commented 11 years ago

Thanks, so again it seems that lxml is breaking the script. If you pip install html5lib outside the virtual env, the script should also work. However, I recommend that you use the virtual env, I included some steps to make it less cumbersome:

You can execute the script with the virtualenv bin as follows:

/path/to/coursera_env/bin/python /path/to/coursera_env/coursera/coursera-dl

You can also make an alias: add

alias coursera="/path/to/coursera_env/bin/python /path/to/coursera_env/coursera/coursera-dl"

to ~/.bashrc (don't forget to source ~/.bashrc). Now you can simply run coursera <class_name> -u ....

elimence commented 11 years ago

thanks for the alias, makes things easier. I now see the value of virtual envs, i have previously encountered conflicts with other setups and had no idea this was the cause. I guess i'll stick with using the coursera-env Thanks again.

rbrito commented 11 years ago

@elimence, thanks for the feedback and glad that it works fine for you now.

I guess that I learned a lot about incompatibilities between python module versions with issues #134, #137, #143 and discussions with you guys.

I'm closing this issue now that the problem seems settled.