datasets / publicbodies

A database of public bodies such as government departments, ministries etc.
http://publicbodies.org
MIT License
63 stars 26 forks source link

`import_br.py` works locally, but fails in Github Actions #176

Closed augusto-herrmann closed 1 year ago

augusto-herrmann commented 1 year ago

Job "Update data from sources (br)" fails with the following error:

Traceback (most recent call last):
  File "scripts/import/br/import_br.py", line 384, in <module>
    import_br_data(URL, args.output)
  File "scripts/import/br/import_br.py", line 251, in import_br_data
    municipios_response = session.get(URL_MUNICIPIOS)
  File "/opt/hostedtoolcache/Python/3.8.16/x64/lib/python3.8/site-packages/requests/sessions.py", line 542, in get
    return self.request('GET', url, **kwargs)
  File "/opt/hostedtoolcache/Python/3.8.16/x64/lib/python3.8/site-packages/requests/sessions.py", line 529, in request
    resp = self.send(prep, **send_kwargs)
  File "/opt/hostedtoolcache/Python/3.8.16/x64/lib/python3.8/site-packages/requests/sessions.py", line 645, in send
    r = adapter.send(request, **kwargs)
  File "/opt/hostedtoolcache/Python/3.8.16/x64/lib/python3.8/site-packages/requests/adapters.py", line 517, in send
    raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='servicodados.ibge.gov.br', port=443): Max retries exceeded with url: /api/v1/localidades/municipios (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1131)')))

The script works locally, though. Also if one is to run the following code locally, the download from ibge.gov.br will proceed normally, without raising an exception:

import requests

USER_AGENT = 'PublicBodiesBot (https://github.com/okfn/publicbodies)'
URL_MUNICIPIOS = 'https://servicodados.ibge.gov.br/api/v1/localidades/municipios'

session = requests.Session()
session.headers.update({'User-Agent': USER_AGENT})

municipios_response = session.get(URL_MUNICIPIOS)

It seems like it's common for the Github Actions environment to give SSL errors when connecting to government websites, as evidenced by these Stack Overflow questions:

From there, we could:

  1. Implement the CustomHttpAdapter workaround from Stack Overflow; or
  2. some commenter suggests using wget; or
  3. stop updating the municipality list and remove this call from the script.

For now, I'm pending towards solution 3, since the list of municipalities in Brazil does not update that often.

augusto-herrmann commented 1 year ago

The IBGE data is needed because the SIORG API, which returns Brazilian Federal Government public bodies information, provides an address but uses only municipality codes, not municipality names.

So if we are going to use option 3 we need to cache somehow (perhaps a small csv in this repository) a mapping of municipality codes to names.

augusto-herrmann commented 1 year ago

Perhaps this csv could be fetched from Github and used instead.