Closed feilong closed 10 years ago
Could your provide more information about the issue? I can not make it reappear.
Are the errors shown before you choose the weeks?
You can access 3 courses on edX 1 - CS188.1x Artificial Intelligence -> Started 2 - CS191x Quantum Mechanics and Quantum Computation -> Started 3 - 6.00x Introduction to Computer Science and Programming -> Started Enter Course Number: 3 6.00x Introduction to Computer Science and Programming has 12 weeks so far 1 - Download Overview videos 2 - Download Week 1 videos 3 - Download Week 2 videos 4 - Download Week 3 videos 5 - Download Week 4 videos 6 - Download Week 5 videos 7 - Download Midterm Exam 1 videos 8 - Download Week 6 videos 9 - Download Week 7 videos 10 - Download Week 8 videos 11 - Download Week 9 videos 12 - Download Peer Grading Panel videos 13 - Download them all
Sure. Please feel free to contact me if there's anything I can do to help.
This error appears just after I select the course.
Here are some additional information that might be useful:
bs4.__version__
'4.1.3'
youtube_dl.__version__
'2013.04.28'
Are you using python2 or python3? And, what language is your operating system using?
I first used python 2, now I've tried python 3, too. Similar error occurs.
It seems related to beautifulsoup4, and I'm trying to figure out why. 在 2013-4-30 上午8:41,"George Monkey" notifications@github.com写道:
Are you using python2 or python3?
— Reply to this email directly or view it on GitHubhttps://github.com/shk3/edx-downloader/issues/18#issuecomment-17203642 .
Thanks. Please let me know, if you figure it out.
On Tue, Apr 30, 2013 at 8:47 AM, feilong notifications@github.com wrote:
I first used python 2, now I've tried python 3, too. Similar error occurs.
It seems related to beautifulsoup4, and I'm trying to figure out why. 在 2013-4-30 上午8:41,"George Monkey" notifications@github.com写道:
Are you using python2 or python3?
— Reply to this email directly or view it on GitHub< https://github.com/shk3/edx-downloader/issues/18#issuecomment-17203642> .
— Reply to this email directly or view it on GitHubhttps://github.com/shk3/edx-downloader/issues/18#issuecomment-17203792 .
Hi, @feilong.
On Mon, Apr 29, 2013 at 9:47 PM, feilong notifications@github.com wrote:
I first used python 2, now I've tried python 3, too. Similar error occurs.
Depending on how you installed BeautifulSoup 4, it can use a number of parsers:
http://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser
In the original case, @feilong's system is using lxml
, which is
probably the fastest, but the fact that lxml
encountered an invalid
byte in the homepage is, indeed, a problem.
Perhaps the problem happens when lxml
is trying to parse your name?
I suspect that, given the original poster's name sounds like Chinese,
you may have Chinese characters.
In this case, you can try to use another parser. For instance,
whenever we have a call to BeautifulSoup(foo)
in our code, try to
enforce a different parser by passing a second argument, as described
in the document listed above.
Please report back the results.
Rogério Brito : rbrito@{ime.usp.br,gmail.com} : GPG key 4096R/BCFCAAAA http://rb.doesntexist.org/blog : Projects : https://github.com/rbrito/ DebianQA: http://qa.debian.org/developer.php?login=rbrito%40ime.usp.br
Ouch. The markup of the above (sent by e-mail) is atrocious. I'm rewriting the message below:
Depending on how you installed BeautifulSoup 4, it can use a number of parsers:
http://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser
In the original case, the poster is using lxml
, which is probably the fastest, but the fact that lxml
encountered an invalid byte in the homepage is, indeed, a problem.
Perhaps the problem happens when lxml
is trying to parse your name? I suspect that, given the original poster's name sounds like Chinese, you may have Chinese characters.
In this case, you can try to use another parser. For instance, whenever we have a call to BeautifulSoup(foo)
in our code, try to enforce a different parser by passing a second argument, as described in the document listed above.
Please report back the results.
@rbrito , I think @feilong must be using Chinese system as his last reply contains Chinese characters.
I am confusing that Chinese is included in utf-8, so why lxml
says 'utf8' can not parse the character? Do you mean lxml
is trying to parse @feilong 's username in edx?
@rbrito , @shk3 , thank you both for your help!
I've tried using html5lib
instead of lxml
and it is working pretty good. So I believe there was something wrong while parsing html with lxml
. I'm not sure whether it's related to Chinese characters, my edx username should only contain English characters.
Earlier today I tried saving the contents of the courseware
variable to a file so I can test repeatedly without connecting to edx. Here is the short code I used to test.
#!/usr/bin/env python
from bs4 import BeautifulSoup
with open('courseware.txt','r') as f:
cw = f.read()
BeautifulSoup(cw, fromEncoding='UTF-8')
Interestingly, as I test it, it throws UnicodeDecodeError
with different contents, like UnicodeDecodeError: 'utf8' codec can't decode byte 0xbc in position 0: invalid start byte
or UnicodeDecodeError: 'utf8' codec can't decode byte 0xb7 in position 1: invalid start byte
. And in some trials, it could run without an error. I'm really puzzled: Since I'm using the same file and same code, why would the results differ from each other?
I got an encoding error before downloading starts. The course link is https://www.edx.org/courses/MITx/6.00x/2013_Spring/ and the error message is as follows: