ThreeSixtyGiving / datagetter

Scripts to download data from http://registry.threesixtygiving.org
MIT License
1 stars 1 forks source link

http 401 errors handler #20

Closed michaelwood closed 3 years ago

michaelwood commented 3 years ago

When we hit a 401 error the datagetter attempts to carry on anyway with flattentool then trying to process the error message as a file.

/home/datastore/datastore/.ve/lib/python3.6/site-packages/flattentool/input.py:382: DataErrorWarning: Duplicate heading "Identifier" found, ignoring the data in column A (sheet: "Sheet1").
  DataErrorWarning,
Traceback (most recent call last):
  File "/home/datastore/datastore/.ve/src/datagetter/getter/get.py", line 116, in fetch_and_convert
    r.raise_for_status()
  File "/home/datastore/datastore/.ve/lib/python3.6/site-packages/requests/models.py", line 943, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://bishopradfordtrust.org.uk/wp-content/uploads/2020/04/360-giving-data-at-2019_2020-Final.xlsx
Traceback (most recent call last):
  File "/home/datastore/datastore/.ve/lib/python3.6/site-packages/flattentool/input.py", line 655, in read_sheets
    self.workbook = openpyxl.load_workbook(self.input_name, data_only=True)
  File "/home/datastore/datastore/.ve/lib/python3.6/site-packages/openpyxl/reader/excel.py", line 318, in load_workbook
    data_only, keep_links)
  File "/home/datastore/datastore/.ve/lib/python3.6/site-packages/openpyxl/reader/excel.py", line 126, in __init__
    self.archive = _validate_archive(fn)
  File "/home/datastore/datastore/.ve/lib/python3.6/site-packages/openpyxl/reader/excel.py", line 98, in _validate_archive
    archive = ZipFile(filename, 'r')
  File "/usr/lib/python3.6/zipfile.py", line 1131, in __init__
    self._RealGetContents()
  File "/usr/lib/python3.6/zipfile.py", line 1198, in _RealGetContents
    raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/home/datastore/datastore/.ve/src/datagetter/getter/get.py", line 187, in fetch_and_convert
    file_type)
  File "/home/datastore/datastore/.ve/src/datagetter/getter/get.py", line 77, in convert_spreadsheet
    metatab_vertical_orientation=True,
  File "/home/datastore/datastore/.ve/lib/python3.6/site-packages/flattentool/__init__.py", line 257, in unflatten
    spreadsheet_input.read_sheets()
  File "/home/datastore/datastore/.ve/lib/python3.6/site-packages/flattentool/input.py", line 659, in read_sheets
    _("The supplied file has extension .xlsx but isn't an XLSX file.")
flattentool.input.BadXLSXZipFile: The supplied file has extension .xlsx but isn't an XLSX file.
Traceback (most recent call last):
  File "/home/datastore/datastore/.ve/src/datagetter/getter/get.py", line 116, in fetch_and_convert
    r.raise_for_status()
  File "/home/datastore/datastore/.ve/lib/python3.6/site-packages/requests/models.py", line 943, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://www.thebromleytrust.org.uk/wp-content/uploads/2020/08/Bromley-Trust-Data-1st-April-2017-to-31st-July-2020.xlsx
Traceback (most recent call last):
  File "/home/datastore/datastore/.ve/lib/python3.6/site-packages/flattentool/input.py", line 655, in read_sheets
    self.workbook = openpyxl.load_workbook(self.input_name, data_only=True)
  File "/home/datastore/datastore/.ve/lib/python3.6/site-packages/openpyxl/reader/excel.py", line 318, in load_workbook
    data_only, keep_links)
  File "/home/datastore/datastore/.ve/lib/python3.6/site-packages/openpyxl/reader/excel.py", line 126, in __init__
    self.archive = _validate_archive(fn)
  File "/home/datastore/datastore/.ve/lib/python3.6/site-packages/openpyxl/reader/excel.py", line 98, in _validate_archive
    archive = ZipFile(filename, 'r')
  File "/usr/lib/python3.6/zipfile.py", line 1131, in __init__
    self._RealGetContents()
  File "/usr/lib/python3.6/zipfile.py", line 1198, in _RealGetContents
    raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file