Closed RudyOnRails closed 2 years ago
Office hours with ops meeting 12/17
@drorva Thanks for the update! Is there a GH issue for the DD alert creation?
@drorva Thanks for the update! Is there a GH issue for the DD alert creation?
@mchelen-gov actually there is not, good question. I am working on perfecting the monitor now, but will create an issue now.
Background
This
ticketEpic's purpose is to document what happened, develop questions to ask or unknowns to know, and store any research and/or evidence to support decisions going forward.TL;DR
The Ch33 direct deposit endpoint and the payment information endpoint have both been down/mostly down for a similar period of time (since Wednesday, Dec 8) and are both supported by the same backend system -- BGS.
Ch33
December 1, Samara noticed an issue on prod and asked if someone could check the Profile in
prod
issue was very sporadic and could not be reproduced in lower environments or locally. Occasionally it would fail to load all profile information and this alert would be displayed:December 8, Lihan contacted BGS via email to inquire about DDEFT latency. We received no answers that day, but they did loop in other teams (Team 1, WebLogic Admin)
December 9, Cory Easley looped in Tuxedo team around 7:30 AM ET. We got a response back from Vehkaiah Kolla at around 10:15 AM ET saying the Tuxedo and Bull admins did not notice any slowdowns. A joint call was set up at 12:30 PM ET and most of the afternoon was spent with Lihan Li, Joe Niquette, Lance Sanchez, Boris Ning comparing logs and request/response times with BGS. During some of the calls, the fwd proxy would occasionally fail to connect. We looped in the Ops team (Bill Ryan) for additional support
Dec 10 - 13, Lance Sanchez, Boris Ning, Mike Chelen continued to troubleshoot. Demian Ginther concluded that VA Network may need to be looped in since Ops has not visibility once it leaves the forward proxy.
Dec 14
Dec 15
Dec 16
View Payments
On Wed, Dec 15, Jason was informed that view-payments was lagging/down for the past few days.
Context
Affected Features (in order of appearance)
@service.people.find_person_by_ptcpnt_id(@user.participant_id, @user.ssn)
service.claims.send(:request, :find_ch33_dd_eft, fileNumber: @user.ssn)
service.ddeft.find_bank_name_by_routng_trnsit_nbr(routing_number)
ch33_bank_accounts
person = BGS::PeopleService.new(current_user).find_person_by_participant_id
response = BGS::PaymentService.new(current_user).payment_history(person)
info cannot be retrieved
Findings
Hypotheses #
SemanticLogger
is showing signs of (at times hundreds of thousands) logs being queued upNext Steps / Tasks
Results
-
Acceptance Criteria