google-code-export / get-flash-videos

Automatically exported from code.google.com/p/get-flash-videos
1 stars 0 forks source link

Can't parse HTML for Comedy Central video (EntityRef error) #239

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. get_flash_videos 
http://comedians.jokes.com/paul-f--tompkins/videos/paul-f--tompkins---phone-love
r
2. Error!

What is the expected output? What do you see instead?

It should download the video. Instead, it throws an entityref error parsing the 
HTML: http://pastebin.com/zQnv897d

What version of the product are you using? On what operating system?

get-flash-videos 1.24, Ubuntu 10.04, latest cpan modules for XML::SAX, 
XML::LibXML, XML::Simple.

Please provide any additional information below.

Original issue reported on code.google.com by seegahan@gmail.com on 17 Feb 2011 at 10:16

GoogleCodeExporter commented 9 years ago
It looks like HTML parser is using an overly strict XML parser. From another 
issue:

  "the XML::Parser module does not accept certain characters in the text:
   & (ampersand, must be encoded as &) 
   < (left angle bracket, must be encoded as <) 
   > (right angle bracket, must be encoded as >)"

Original comment by seegahan@gmail.com on 17 Feb 2011 at 10:17

GoogleCodeExporter commented 9 years ago
Must have been fixed at some point or time as it works fine for me.

Please tell us if you still have some problems but it could be as simple as 
updating to  the latest git commit.

Original comment by mjbauer95 on 17 May 2011 at 3:10