elpendor / ES-scraper

A scraper for EmulationStation
47 stars 41 forks source link

Error while scraping via scraper.py -w 275 #16

Closed petrockblog closed 11 years ago

petrockblog commented 11 years ago

I get the following error with this rom:

Trying to identify Boxing Legends of the Ring # SMD.SMD..
Traceback (most recent call last):
  File "RetroPie/supplementary/ES-scraper/scraper.py", line 301, in <module>
    scanFiles(ES_systems[i])
  File "RetroPie/supplementary/ES-scraper/scraper.py", line 206, in scanFiles
    nodes=getGameInfo(files, platformID).getroot()
  File "RetroPie/supplementary/ES-scraper/scraper.py", line 100, in getGameInfo
    return ET.parse(urllib.urlopen(URL))
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1183, in parse
    tree.parse(source, parser)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 656, in parse
    parser.feed(data)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1643, in feed
    self._raiseerror(v)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1507, in _raiseerror
    raise err
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 861, column 718
elpendor commented 11 years ago

Filename, system and scraping mode?

That looks really weird, as in the xml source aint right for some reason.

stickystyle commented 11 years ago

When you query thegamesdb.net with that title, they return invalid XML at the position mentioned in the traceback, a unescaped "<>". http://thegamesdb.net/api/GetGame.php?name=Boxing%20Legends%20of%20the%20Ring

elpendor commented 11 years ago

Yep, it's exactly that. There's some malformed XML on the 3rd result (element tags (<>) on the description).

I just added a quick check to skip the game if the source data is malformed, but it needs to be fixed on TheGamesDB (afaik, it's an open database, it shouldn't be hard to do). I'll commit the changes in a few minutes.

I'll also check the API and see if I can limit the results (since I'm just using the first result anyway). The game that's causing the problem is "Slam - Shaq Vs. The Legends" so a simple "malformed xml, please check the database" won't be of any help.

elpendor commented 11 years ago

Changes commited.

Apparently there's no method to limit the results via API, so for now I'm just showing an error informing there's malformed XML and providing the URL where the error is.

I also fixed it from the source, I just removed the line that was causing the trouble.

Someone should report this to TheGamesDB though. I honestly don't know why they're not escaping those characters. Hopefully that's the only game.

Closed.

petrockblog commented 11 years ago

Great, thanks a lot!