PennyDreadfulMTG / Penny-Dreadful-Tools

A suite of tools for the Penny Dreadful MTGO community
https://pennydreadfulmagic.com
MIT License
40 stars 28 forks source link

Goldfish scraper choking on one particular deck #3962

Closed bakert closed 6 years ago

bakert commented 6 years ago
Fetching https://www.mtggoldfish.com/deck/794284#online (cache ok)
Traceback (most recent call last):
  File "run.py", line 77, in <module>
    run()
  File "run.py", line 32, in run
    task(sys.argv)
  File "run.py", line 70, in task
    s.scrape()
  File "/home/discord/decksite/decksite/scrapers/mtggoldfish.py", line 31, in scrape
    d.created_date = scrape_created_date(d)
  File "/home/discord/decksite/decksite/scrapers/mtggoldfish.py", line 50, in scrape_created_date
    description = soup.select_one('div.deck-view-description').renderContents().decode('utf-8')
AttributeError: 'NoneType' object has no attribute 'renderContents'
bakert commented 6 years ago

Probably something to do with it only have 24 cards in it. The class it's looking for is there.

bakert commented 6 years ago

This actually seems to have been a throttling issue. I've added a lot more sleeping (1 sec per deck instead of 0.1s per page of decks) and completed a full run from local (which was failing before adding the extra sleep and with 0.1s per deck) so I think we're good here.

c8275907.