Closed saulpw closed 8 years ago
[localhost][development]~/Workplace/Freelance/web_scraping/Saul_Pwanson/xd
>>>python main.py --download-xd -s theglobeandmail_canadian -o a.zip -f 2016-02-06 -t 2016-02-16
Processing Crossword for date - 2016-02-06
ERR: No Crossword for date
Processing Crossword for date - 2016-02-07
ERR: No Crossword for date
Processing Crossword for date - 2016-02-08
Processing Crossword for date - 2016-02-09
ERR: No Crossword for date
Processing Crossword for date - 2016-02-10
ERR: No Crossword for date
Processing Crossword for date - 2016-02-11
ERR: No Crossword for date
Processing Crossword for date - 2016-02-12
ERR: No Crossword for date
Processing Crossword for date - 2016-02-13
ERR: No Crossword for date
Processing Crossword for date - 2016-02-14
ERR: No Crossword for date
Processing Crossword for date - 2016-02-15
Processing Crossword for date - 2016-02-16
ERR: No Crossword for date
[localhost][development]~/Workplace/Freelance/web_scraping/Saul_Pwanson/xd
>>>unzip a.zip
Archive: a.zip
inflating: crosswords-theglobeandmail_canadian/2016/theglobeandmail_canadian-2016-02-08.xd
inflating: crosswords-theglobeandmail_canadian/2016/theglobeandmail_canadian-2016-02-15.xd
[localhost][development]~/Workplace/Freelance/web_scraping/Saul_Pwanson/xd
>>>
As you can see, the scraper fails and silently moves over days for which there is no puzzle. In the above executed example theglobeandmail_canadian has valid puzzles only on 8Feb2016 and 15Feb2016 and these are exactly the files that have been created in the output zip.
Changing the generic code for such specific websites is not much of a good idea because, -
... with the current generic way we have the code, the above scenarios will be covered with ease. Its okay to let the scraper check the website for a puzzle and fail if it is not available - we're good as long as the failure doesn't abruptly stop execution.
The other 6 days will fail, so don't bother with those requests.
Also theglobeandmail_canadian.