biglocalnews / covid-world-scraper

scrapers for the pitch map
ISC License
0 stars 0 forks source link

South Africa scraper breaks in production sporadically #76

Open zstumgoren opened 4 years ago

zstumgoren commented 4 years ago

South Africa (ZAF) scraper is breaking in production sporadically. Note that I'm unable to duplicate breakage locally.

ubuntu@data-etl:~$ covid-world-scraper --cache-dir /home/ubuntu/data/covid-world-scraper/ --log-file /home/ubuntu/logs/covid-world-scraper.log zaf
covid_world_scraper.country_scraper - START SCRAPE - Zaf
covid_world_scraper.runner -   File "/home/ubuntu/.pyenv/versions/3.7.4/lib/python3.7/site-packages/covid_world_scraper/runner.py", line 44, in run
    scraper.run()
  File "/home/ubuntu/.pyenv/versions/3.7.4/lib/python3.7/site-packages/covid_world_scraper/country_scraper.py", line 58, in run
    raw_data_path = self.fetch()
  File "/home/ubuntu/.pyenv/versions/3.7.4/lib/python3.7/site-packages/covid_world_scraper/zaf.py", line 29, in fetch
    most_recent_link = data_links[0][1]
zstumgoren commented 4 years ago

The response.status_code is 522 (a connection timeout error from Cloudfare). It seems to only be happening intermittently in production. Apparently this is a server-side issue, and the only apparent "fix" from our end would be to try the scraper again at a later time...

zstumgoren commented 4 years ago

The ideal long-term solution would be to schedule additional retries using Airflow, once it's deployed. Short-term, we could simply try running the scraper for this one country more frequently on cron.