Script crashes on " Booster Gold Vol. 1"

cbanack / comic-vine-scraper

An add-on script for ComicRack that lets you copy details from Comic Vine into your comic books.

257 stars 48 forks source link

Script crashes on " Booster Gold Vol. 1" #51

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago

DESCRIBE THE PROBLEM:

The script crashes when reading this series.

WHAT VERSION OF COMICVINESCRAPER ARE YOU USING?
v1.0.13

Original issue reported on code.google.com by usali...@gmail.com on 2 Mar 2010 at 5:20

Attachments:

log-BGv1.txt

GoogleCodeExporter commented 9 years ago

Hmm, looks like a bug with ComicVine; they're returning invalid results.

I raised a bug report on their website:

http://www.comicvine.com/forums/bug-reporting/2/api-bug-invalid-xml/532559/

I will look into this when I have a chance, and see if I can find a way to make 
the
Comic Vine Scraper work around the problem--because, given the ComicVine guys' 
track
record for fixing their API bugs, it may be a long time before they do 
anything. :(

Thanks for the bug report.

Original comment by cban...@gmail.com on 2 Mar 2010 at 7:19

Changed state: Accepted

GoogleCodeExporter commented 9 years ago

I think it has to do with an illegal character in the summary field (the 
description
at CV).  It's a weird little cross symbol.

Here is the error log:

ERROR: cannot parse results from comicvine:
http://api.comicvine.com/issue/26451/?api_key=4192f8503ea33364a23035827f40d415d5
dc5d18&format=xml
Caught SystemError: '', hexadecimal value 0x10, is an invalid character. Line 
2,
position 6801.

and here is the txt from CV:

"I expected to play myself!" Booster flashes an arrogant smirk. "Throw in  of
merchandising and points!"

Just have the scraper parse out that illegal character and it should be fine.  
Or
have the mods at CV fix the text...which may take longer.

Original comment by revqu...@gmail.com on 2 Mar 2010 at 7:50

GoogleCodeExporter commented 9 years ago

Yeah, I think you're right about what's going on.

If possible I'd like to try to find a description of all of the valid xml 
characters,
and write a solution that strips out ALL the illegal characters, maybe 
replacing them
with question marks or something.   That way, we don't end up seeing this bug 
again
in the future with a different illegal character.

Original comment by cban...@gmail.com on 2 Mar 2010 at 8:58

GoogleCodeExporter commented 9 years ago

For reference, I found this too and it was only on one of the early issues (1 
or 2).  
You should be able to scrape the rest and manually enter CVDB tags for the 
first 2 
until this issue is fixed

Original comment by bmen...@gmail.com on 3 Mar 2010 at 12:26

GoogleCodeExporter commented 9 years ago

Fixed in 1.0.14.

The fix that I implemented "knows" about ALL of the possibly valid XML 
characters,
and will automatically strip out ANYTHING that isn't valid.  (There shouldn't 
be very
many comics on Comic Vine containing invalid XML characters in their 
user-entered
fields, but any of them that do exist should now be parsed in properly by the 
scraper.)

Original comment by cban...@gmail.com on 5 Mar 2010 at 5:27

Changed state: Fixed