Scraper doesn't run due to Census data unavailability

@nkrishnaswami

When I tried to run the scraper this evening, I got an error that this 2018 US Census Excel file no longer exists. The file also fails to load when I paste the URL directly into a web browser.

https://www2.census.gov/programs-surveys/popest/geographies/2018/all-geocodes-v2018.xlsx

Unfortunately this means we must halt daily scraper runs until this is resolved.

Do we have a local copy saved? Or, alternatively, could we modify the scraper so that it continues to pull the data while ignoring the unavailable Census file?

The error message is provided below.

2020-10-14 21:35:36,120 INFO covid19_scrapers.web_cache:  Connecting web cache to DB: work/web_cache.db
Traceback (most recent call last):
  File "run_scrapers.py", line 189, in <module>
    main()
  File "run_scrapers.py", line 165, in main
    registry_args=dict(enable_beta_scrapers=opts.enable_beta_scrapers),
  File "/Users/poisson/Documents/GitHub/COVID19_tracker_data_extraction/workflow/python/covid19_scrapers/__init__.py", line 61, in make_scraper_registry
    census_api = CensusApi(census_api_key)
  File "/Users/poisson/Documents/GitHub/COVID19_tracker_data_extraction/workflow/python/covid19_scrapers/census/census_api.py", line 31, in __init__
    self.fips = FipsLookup()
  File "/Users/poisson/Documents/GitHub/COVID19_tracker_data_extraction/workflow/python/covid19_scrapers/census/fips_lookup.py", line 22, in __init__
    df = pd.read_excel(get_content_as_file(self.CODES_URL), skiprows=4)
  File "/Users/poisson/Documents/GitHub/COVID19_tracker_data_extraction/workflow/python/covid19_scrapers/utils/http.py", line 100, in get_content_as_file
    return BytesIO(get_content(url, **kwargs))
  File "/Users/poisson/Documents/GitHub/COVID19_tracker_data_extraction/workflow/python/covid19_scrapers/utils/http.py", line 94, in get_content
    r = get_cached_url(url, **kwargs)
  File "/Users/poisson/Documents/GitHub/COVID19_tracker_data_extraction/workflow/python/covid19_scrapers/utils/http.py", line 59, in get_cached_url
    return UTILS_WEB_CACHE.fetch(url, **kwargs)
  File "/Users/poisson/Documents/GitHub/COVID19_tracker_data_extraction/workflow/python/covid19_scrapers/web_cache.py", line 263, in fetch
    response.raise_for_status()
  File "/Users/poisson/Documents/GitHub/COVID19_tracker_data_extraction/covid19_data_test_003/lib/python3.7/site-packages/requests/models.py", line 941, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: https://www2.census.gov/programs-surveys/popest/geographies/2018/all-geocodes-v2018.xlsx

d4bl / COVID19_tracker_data_extraction

Scraper doesn't run due to Census data unavailability #157