department-of-veterans-affairs / vets-api

API powering VA.gov
https://api.va.gov
Other
204 stars 61 forks source link

Facilities locator should use timeouts/retries for external service integrations #508

Closed jkassemi closed 7 years ago

jkassemi commented 7 years ago

We experienced an increased error rate from the facility locator API that triggered a PagerDuty alert. This issue has since resolved itself. I tracked this down to timeouts communicating with the ArcGIS endpoints from requests made with the Facilities client (https://github.com/department-of-veterans-affairs/vets-api/blob/eb53033d96bb45ebd6e751c032ac6a1800015195/lib/facilities/async_client.rb).

It seems like this may be using no timeout value, which will cause a request to terminate and respond with a 504 after 60 seconds from the ELB that this application sits behind. No errors are then reported to Sentry.

This should have a reasonable timeout value, and should throw an exception that's reported to Sentry. Breakers integration may be useful here as well.

patrickvinograd commented 7 years ago

I see a couple instances of GIS queries with total_time=~120s, return_code=recv_error with subsequent queries piling up.

D, [2016-11-13T16:57:06.123890 #20939] DEBUG -- : ETHON:         performed EASY effective_url=https://services3.arcgis.com/aqgBd3l68G8hEFFE/ArcGIS/rest/services/VHA_Facilities/FeatureServer/0/query?geometry=-90.645564%2C29.476021%2C-89.645564%2C30.476021&geometryType=esriGeometryEnvelope&where=&inSR=4326&outSR=4326&returnGeometry=true&returnCountOnly=false&outFields=%2A&returnDistinctValues=false&orderByFields=StationNumber&f=json response_code=200 return_code=recv_error total_time=119.012348
D, [2016-11-13T16:57:06.125139 #20939] DEBUG -- : ETHON: performed MULTI
D, [2016-11-13T16:57:06.125576 #20939] DEBUG -- : ETHON: performed MULTI
D, [2016-11-13T16:57:06.141205 #20939] DEBUG -- : ETHON: performed MULTI
D, [2016-11-13T16:57:06.141234 #20939] DEBUG -- : ETHON: performed MULTI
D, [2016-11-13T16:57:06.141251 #20939] DEBUG -- : ETHON: performed MULTI
D, [2016-11-13T16:57:06.141265 #20939] DEBUG -- : ETHON: performed MULTI
D, [2016-11-13T16:57:06.141279 #20939] DEBUG -- : ETHON: performed MULTI
D, [2016-11-13T16:57:06.141293 #20939] DEBUG -- : ETHON: performed MULTI
D, [2016-11-13T16:57:06.141311 #20939] DEBUG -- : ETHON: performed MULTI
D, [2016-11-13T16:57:06.141378 #20939] DEBUG -- : ETHON: performed MULTI
D, [2016-11-13T16:57:06.144778 #20939] DEBUG -- : ETHON: performed MULTI
D, [2016-11-13T16:57:06.141326 #20939] DEBUG -- : ETHON: performed MULTI
patrickvinograd commented 7 years ago

TImeouts fell by the wayside when I switched to Typheous for parallel queries, but it looks like I can use Typheous as a Faraday adapter and integrate breakers that way.

jcmeloni-zz commented 7 years ago

@patrickvinograd Is this closeable since the PR referenced above is merged?