department-of-veterans-affairs / va.gov-team

Public resources for building on and in support of VA.gov. Visit complete Knowledge Hub:
https://depo-platform-documentation.scrollhelp.site/index.html
283 stars 205 forks source link

Exception handling and error coercion refactor #1816

Open kfrz opened 5 years ago

kfrz commented 5 years ago

User Story or Problem Statement

As a engineer working with vets-api I need a maintainable and clearly understood approach for handling errors returned from external (upstream) services, so that future developers can easily augment and extend custom error handling logic when necessary.

Background

🖱️Click me to expand background details Currently, we approach exception coercion and error reporting in a smattering of various classes used as mixins and middlewares. E.g.,: - `lib/common/exceptions/external/backend_service_exception.rb` This file implements a generic error class which is customized with a message and a status code that is manually defined. This status code is ultimately used when logging the error into `Rails.logger` (or Sentry, etc). The status code is mapped from a given value from the external request, which is then keyed against a list of custom error messages found in: `config/locales/exceptions.en.yml` Let's take a look at an example: - `lib/evss/service.rb` ```rb # frozen_string_literal: true require 'common/client/base' require 'evss/auth_headers' module EVSS class Service < Common::Client::Base # ... private def with_monitoring_and_error_handling with_monitoring(2) do yield end rescue => e handle_error(e) end # ... def handle_error(error) Raven.extra_context( message: error.message, url: config.base_path ) case error when Faraday::ParsingError raise_backend_exception('EVSS502', self.class) when Common::Client::Errors::ClientError Raven.extra_context(body: error.body) raise Common::Exceptions::Forbidden if error.status == 403 raise_backend_exception('EVSS400', self.class, error) if error.status == 400 raise_backend_exception('EVSS502', self.class, error) else raise error end end def raise_backend_exception(key, source, error = nil) raise Common::Exceptions::BackendServiceException.new( key, { source: "EVSS::#{source}" }, error&.status, error&.body ) end end end ``` The private methods in the above class are *mostly* concerned with formatting the particulars of the exception that is to be raised. This is not necessarily the concern of the service class, which should be focused on making requests to the service. The error logic can be abstracted and put into a common middleware which can be extended to handle specifics from a given service, but can handle common cases. Here's the appeals service which looks pretty nice: ```rb # frozen_string_literal: true require 'common/client/concerns/monitoring' module Appeals ## # Proxy Service for Appeals Caseflow. # # @example Create a service and fetching appeals for a user # appeals_response = Appeals::Service.new.get_appeals(user) # class Service < Common::Client::Base include SentryLogging include Common::Client::Monitoring configuration Appeals::Configuration STATSD_KEY_PREFIX = 'api.appeals' ## # Returns appeals data for a user by their SSN. # # @param user [User] The user object, usually the `@current_user` from a controller. # @param additional_headers [Hash] Any additional HTTP headers you want to include in the request. # @return [Appeals::Responses::Appeals] Response object that includes the body. # def get_appeals(user, additional_headers = {}) with_monitoring do response = perform(:get, '/api/v2/appeals', {}, request_headers(user, additional_headers)) Appeals::Responses::Appeals.new(response.body, response.status) end end ## # Pings the Appeals Status health check endpoint. # # @return [Faraday::Response] Faraday response instance. # def healthcheck with_monitoring do perform(:get, '/health-check', nil) end end private def request_headers(user, additional_headers) { 'ssn' => user.ssn, 'Authorization' => "Token token=#{Settings.appeals.app_token}" }.merge(additional_headers) end end end ``` In this service class [configuration](https://github.com/department-of-veterans-affairs/vets-api/blob/master/lib/appeals/configuration.rb), errors are first passed to the `appeals_errors` middleware, then through the `raise_error` middleware. --- #### Exception lifecycle When a request throws an exception back up the stack to `ApplicationController`, a `rescue` block is called which (at time of writing) does the following: - If exception type is not declared in `SKIP_SENTRY_EXCEPTION_TYPES` constant, add additional context to Sentry payload about the original exception response - Coerces the exception into a `vets-api` custom `Common::Exceptions::*` class based on original exception class - Logs exception to sentry - Sets some headers if it's an Unauthorized exception - Renders a JSON response containing error messages and status code for the client #### Pain Points - Each service is on its own to implement error handling as they see fit, which could be normalized - External errors are not all the same when it comes to how they use status codes, response bodies vary, etc. - Quality of API documentation varies from org to org - `exceptions.en.yml` is long, hard to parse as a human, and one will have to remember to add to it - Potential & likely duplication of various status codes. - An example is `APPEALSSTATUS50x`, where the status code we provide in the body to the FE is not accurately mapped: ```yml APPEALSSTATUS500: <<: *external_defaults title: 'Bad Gateway' code: 'APPEALSSTATUS500' detail: 'Received a 500 response from the upstream server' status: 502 APPEALSSTATUS502: <<: *external_defaults title: 'Bad Gateway' code: 'APPEALSSTATUS502' detail: 'Received an invalid response from the upstream server' status: 502 ``` - :point_up: The historical context here is that the FE doesn't necessarily care about the distinction in terms of user behavior, so this may be a full-stack discussion about how we are using statuses on the Frontend.

Goal(s)

Error handling is consistent, navigable, and reliable throughout the API.

Objectives or Key Results this is meant to further

Acceptance Criteria


Globally scoped work

Service scoped work

The following services are ordered by priority, based on an audit of complexity, size, and impact to VSP products.

We will approach these one-by-one as each will have different requirements.

🧼 = SOAP API

In general, services should all fulfill the following acceptance criteria:

LindseySaari commented 5 years ago

MVI may be another service similar to EVSS, where we will want to extract some of the error handling logic out

omgitsbillryan commented 5 years ago

Some random points:

kfrz commented 5 years ago
  • The only way to make vets-api return an error response code (anything other than 200) is to raise an exception and let it bubble up to application_controller.rb. I find this problematic because there are cases where an upstream service returns a failure that isn't "exceptional", and it would be nice if the controller in question could just render json: error_json_stuff, status: :not_found. This is reason for SKIP_SENTRY_EXCEPTIONS existing nauseated_face.

Good point. Do you have suggestions on how this could be a more approachable interface? Just gathering ideas at this point.

  • The commonality between errors from upstream services is limited. Even 2 different services that are both from EVSS, may have very different error responses. As a result some of our middlewares have become these "if-elsif-elsif-else" messes because they're trying to do too much. Not saying that middlewares aren't useful here, but the current code assumes everything is "common" when in reality each is specific to a given service.

Agreed. Investigating which of the classes are most offensive and will suss out some more solutions in future comments.

  • This could become a big bottomless black hole of work. We probably need some reasonable way to break it up so that so as not to fall into it forever fearful

Agreed. We intend to audit and prioritize the services based on complexity/legacy-ness of code.

LindseySaari commented 5 years ago

This is a WIP branch for PPMS refactor (for Sprint 10 tracking purposes): https://github.com/department-of-veterans-affairs/vets-api/tree/1816-ppms-service-refactor