PennyDreadfulMTG / Penny-Dreadful-Tools

A suite of tools for the Penny Dreadful MTGO community
https://pennydreadfulmagic.com
MIT License
41 stars 28 forks source link

scrapers fail with "UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 899: ordinal not in range(128)" #694

Closed bakert closed 7 years ago

bakert commented 7 years ago

discord@opensuse:~/decksite> python3 run.py scraper all Fetching https://mtgjson.com/json/version.json (Last Modified=Fri, 28 Apr 2017 22:23:19 GMT) HACK: Using local legal_cards override. Traceback (most recent call last): File "run.py", line 54, in run() File "run.py", line 28, in run from decksite.main import APP File "/home/discord/decksite/decksite/main.py", line 10, in from decksite import league as lg File "/home/discord/decksite/decksite/league.py", line 7, in from magic import legality, rotation File "/home/discord/decksite/magic/legality.py", line 2, in from magic import oracle, multiverse File "/home/discord/decksite/magic/oracle.py", line 135, in multiverse.init() File "/home/discord/decksite/magic/multiverse.py", line 16, in init set_legal_cards() File "/home/discord/decksite/magic/multiverse.py", line 204, in set_legal_cards new_list = fetcher.legal_cards(force, season) File "/home/discord/decksite/magic/fetcher.py", line 18, in legal_cards legal = h.readlines() File "/usr/lib64/python3.5/encodings/ascii.py", line 26, in decode return codecs.ascii_decode(input, self.errors)[0] UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 899: ordinal not in range(128)

bakert commented 7 years ago

Simple repro on prod:

discord@opensuse:~/decksite> python3 Python 3.5.1 (default, Dec 09 2015, 07:29:36) [GCC] on linux Type "help", "copyright", "credits" or "license" for more information.

import magic.multiverse magic.multiverse.init() Fetching https://mtgjson.com/json/version.json (Last Modified=Fri, 28 Apr 2017 22:23:19 GMT) HACK: Using local legal_cards override. Traceback (most recent call last): File "", line 1, in File "/home/discord/decksite/magic/multiverse.py", line 16, in init set_legal_cards() File "/home/discord/decksite/magic/multiverse.py", line 204, in set_legal_cards new_list = fetcher.legal_cards(force, season) File "/home/discord/decksite/magic/fetcher.py", line 18, in legal_cards legal = h.readlines() File "/usr/lib64/python3.5/encodings/ascii.py", line 26, in decode return codecs.ascii_decode(input, self.errors)[0] UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 899: ordinal not in range(128)

bakert commented 7 years ago

I temporarily fixed this by ascii-fying the legal_cards.txt file currently being by decksite/the scrapers.

{"Lim-Dul's Cohort", 'Khabal Ghoul', 'Jotun Grunt', "Lim-Dul's High Guard", 'Ghazban Ogre', 'Dandan', 'Lim-Dul the Necromancer', 'Junun Efreet', 'Seance', 'Jotun Owl Keeper'}

It's pretty gross. I'm not sure if there'll still be an issue when pdmtgo.com/legal_cards.txt is serving the right thing or not.

silasary commented 7 years ago

I've fixed upstream, Hopefully this will be fine now.