department-of-veterans-affairs / va.gov-team

Public resources for building on and in support of VA.gov. Visit complete Knowledge Hub:
https://depo-platform-documentation.scrollhelp.site/index.html
283 stars 204 forks source link

Sentry platform-api-production reporting drops off #270

Closed kfrz closed 5 years ago

kfrz commented 5 years ago

User Story

As a triage team member, I need constant monitoring of vets-api in production with Sentry so I can have clear insight into platform errors.

Goal

Sentry reporting for platform-api-production does not drop off intermittently

Background

From vets.gov-team #18898:

This has been noticed lately on Sentry, we have windows of missing events getting reported: image.png

Other services during the same interval are continuing to send events.

Found this in a vets-api-worker log file this morning:

{"host":"ip-10-247-33-72","application":"vets-api-worker","timestamp":"2019-06-07T09:21:01.560594Z","level":"error","level_index":4,"pid":2445,"thread":"87701660","file":"/srv/vets-api/src/vendor/bundle/ruby/2.4/gems/sentry-raven-2.7.4/lib/raven/client.rb","line":97,"name":"Rails","message":"Unable to record event with remote Sentry server (Common::Exceptions::BackendServiceException - BackendServiceException: {:status=>504, :detail=>nil, :code=>\"VA900\", :source=>nil}):\n/srv/vets-api/src/lib/common/client/middleware/response/raise_error.rb:29:in `raise_error!'\n/srv/vets-api/src/lib/common/client/middleware/response/raise_error.rb:22:in `on_complete'\n/srv/vets-api/src/vendor/bundle/ruby/2.4/gems/faraday-0.9.2/lib/faraday/response.rb:9:in `block in call'\n/srv/vets-api/src/vendor/bundle/ruby/2.4/gems/faraday-0.9.2/lib/faraday/response.rb:57:in `on_complete'\n/srv/vets-api/src/vendor/bundle/ruby/2.4/gems/faraday-0.9.2/lib/faraday/response.rb:8:in `call'\n/srv/vets-api/src/vendor/bundle/ruby/2.4/gems/faraday-0.9.2/lib/faraday/rack_builder.rb:139:in `build_response'\n/srv/vets-api/src/vendor/bundle/ruby/2.4/gems/faraday-0.9.2/lib/faraday/connection.rb:377:in `run_request'\n/srv/vets-api/src/vendor/bundle/ruby/2.4/gems/faraday-0.9.2/lib/faraday/connection.rb:177:in `post'\n/srv/vets-api/src/vendor/bundle/ruby/2.4/gems/sentry-raven-2.7.4/lib/raven/transports/http.rb:24:in `send_event'\n/srv/vets-api/src/vendor/bundle/ruby/2.4/gems/sentry-raven-2.7.4/lib/raven/client.rb:37:in `send_event'\n/srv/vets-api/src/vendor/bundle/ruby/2.4/gems/sentry-raven-2.7.4/lib/raven/instance.rb:81:in `send_event'"}
{"host":"ip-10-247-33-72","application":"vets-api-worker","timestamp":"2019-06-07T09:21:01.560666Z","level":"error","level_index":4,"pid":2445,"thread":"87701660","file":"/srv/vets-api/src/vendor/bundle/ruby/2.4/gems/sentry-raven-2.7.4/lib/raven/client.rb","line":101,"name":"Rails","message":"Failed to submit event: <no message value>"}

I was able to 'resolve' the issue on that particular server by restarting the worker process.

Restart worker command: sudo initctl restart vets-api-worker

Acceptance Criteria

Definition of Done

annaswims commented 5 years ago

likely related to https://github.com/department-of-veterans-affairs/vets.gov-team/issues/17565

kfrz commented 5 years ago
kfrz commented 5 years ago

Alarm created, issue created to track - #304

From here I think it's best if we monitor until the end of the sprint to see reducing overall event volume from the frontend helps to alleviate this problem, and close if we don't see it happening again. Can reopen if alarms sound.