Metro-Records / la-metro-councilmatic

:metro: An instance of councilmatic for LA Metro
MIT License
6 stars 2 forks source link

Handle external request failures more gracefully #699

Open hancush opened 3 years ago

hancush commented 3 years ago

Infrequently, the current meetings request to Legistar is reset or otherwise fails, causing the homepage to return a server error.

ConnectionResetError: [Errno 104] Connection reset by peer
  File "urllib3/connectionpool.py", line 600, in urlopen
    chunked=chunked)
  File "urllib3/connectionpool.py", line 384, in _make_request
    six.raise_from(e, None)
  File "<string>", line 2, in raise_from

  File "urllib3/connectionpool.py", line 380, in _make_request
    httplib_response = conn.getresponse()
  File "http/client.py", line 1373, in getresponse
    response.begin()
  File "http/client.py", line 311, in begin
    version, status, reason = self._read_status()
  File "http/client.py", line 272, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "socket.py", line 586, in readinto
    return self._sock.recv_into(b)
ProtocolError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
  File "requests/adapters.py", line 449, in send
    timeout=timeout
  File "urllib3/connectionpool.py", line 638, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "urllib3/util/retry.py", line 368, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "urllib3/packages/six.py", line 685, in reraise
    raise value.with_traceback(tb)
  File "urllib3/connectionpool.py", line 600, in urlopen
    chunked=chunked)
  File "urllib3/connectionpool.py", line 384, in _make_request
    six.raise_from(e, None)
  File "<string>", line 2, in raise_from

  File "urllib3/connectionpool.py", line 380, in _make_request
    httplib_response = conn.getresponse()
  File "http/client.py", line 1373, in getresponse
    response.begin()
  File "http/client.py", line 311, in begin
    version, status, reason = self._read_status()
  File "http/client.py", line 272, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "socket.py", line 586, in readinto
    return self._sock.recv_into(b)
ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
  File "django/core/handlers/exception.py", line 34, in inner
    response = get_response(request)
  File "django/core/handlers/base.py", line 115, in _get_response
    response = self.process_exception_by_middleware(e, request)
  File "django/core/handlers/base.py", line 113, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "django/views/decorators/cache.py", line 44, in _wrapped_view_func
    response = view_func(request, *args, **kwargs)
  File "django/views/generic/base.py", line 71, in view
    return self.dispatch(request, *args, **kwargs)
  File "django/views/generic/base.py", line 97, in dispatch
    return handler(request, *args, **kwargs)
  File "django/views/generic/base.py", line 158, in get
    context = self.get_context_data(**kwargs)
  File "councilmatic_core/views.py", line 142, in get_context_data
    context = super(IndexView, self).get_context_data(**kwargs)
  File "django/views/generic/base.py", line 26, in get_context_data
    kwargs.update(self.extra_context)
  File "lametro/views.py", line 75, in extra_context
    extra['current_meeting'] = self.event_model.current_meeting()
  File "lametro/models.py", line 561, in current_meeting
    streaming_meeting = cls._streaming_meeting()
  File "lametro/models.py", line 525, in _streaming_meeting
    running_events = requests.get('http://metro.granicus.com/running_events.php')
  File "requests/api.py", line 75, in get
    return request('get', url, params=params, **kwargs)
  File "requests/api.py", line 60, in request
    return session.request(method=method, url=url, **kwargs)
  File "requests/sessions.py", line 533, in request
    resp = self.send(prep, **send_kwargs)
  File "requests/sessions.py", line 646, in send
    r = adapter.send(request, **kwargs)
  File "requests/adapters.py", line 498, in send
    raise ConnectionError(err, request=request)

This is not the first issue we've had with external requests impacting the app (#543, #382). It would be a good idea to validate responses and provide fallbacks on request failures across the site. The external integrations I can think of off the top of my head are:

sentry-io[bot] commented 3 years ago

Seems like the running events endpoint sometimes throws a 500. Guessing but not certain this happens when there are not running events. In any case, we should handle this eventuality.

Sentry issue: LA-METRO-COUNCILMATIC-53

hancush commented 2 years ago

@fatima3558 We make external requests to the sites listed in the issue description in various parts of the app. If they aren’t available, the page can’t load. That’s a pretty broad surface area for failure!

Let's start with our request to the running events endpoint in Legistar.

Can you propose a better pattern for making this request that A. sets a reasonable timeout and B. handles non-200 status codes and timeouts gracefully? I'd recommend seeing what functionality comes out of the box with the Python requests library before delving into custom code!

fgomez828 commented 2 years ago

@hancush here's what I'm thinking about this issue so far:

A. sets a reasonable timeout -- it's quite simple to add a timeout to a get request; it's a matter of passing in a timeout keyword (example from docs below). I'm not sure how long a reasonable timeout should be for this situation. I'm thinking about 5 seconds?

requests.get('https://github.com/', timeout=0.001)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
requests.exceptions.Timeout: HTTPConnectionPool(host='github.com', port=80): Request timed out. (timeout=0.001)

B. handles non-200 status codes and timeouts gracefully The current logic returns an empty LAMetroEvent queryset if the response json from granicus is empty or if none of the guids from the response json correspond to an event.

In the case of a timeout or non-200 status code, we should return the same empty queryset, but we should also add some sort of additional note about granicus being temporarily unavailable. This way, the user knows that they should try again later because there might be a meeting stream to view.

hancush commented 2 years ago

@fatima3558 In general, this makes sense to me! Yes to a 5-second timeout and great job thinking through what the method should return if the request is not successful. I agree an empty queryset is best.

we should also add some sort of additional note about granicus being temporarily unavailable. This way, the user knows that they should try again later because there might be a meeting stream to view.

These errors tend to be transient (i.e., they resolve themselves quickly), so let's table the message for now.

Can you draft a PR that adds the timeout and handling for non-200 status codes to the _streaming_meeting property?

hancush commented 2 years ago

I added a 5-second timeout to SmartLogic requests in #771 (currently a draft).