Update Search.gov implementation to use different API endpoint

jilladams commented 1 month ago

Status

[2024-08-15] [Fran] Updated ticket to include splitting apart monitoring for Search and Search Typeahead to enable applying different tolerances. Slack convo here. [2024-08-13] [Fran] Michelle M has approved this work, although it shouldn't bump critical or project work that's moving toward deadlines.

User Story or Problem Statement

API Endpoint Problem Statement: The Search.gov implementation has been running into monitoring issues, with large numbers of 50x errors from the Search typeahead and main search endpoints. Search.gov has asked us to use a different endpoint for our search queries: https://api.gsa.gov/technology/searchgov/v2/results/i14y
Search and Search Typeahead Monitoring Problem Statement Having the same monitoring tolerance for search typeahead and search is problematic in that the issues we're seeing with search typeahead don't constitute an actual search outage, and having it within regular search is causing unnecessary noise in the existing monitor, thus causing engineers/Jill/watch officers to perform unnecessary triage for something that isn't an outage.

Description or Additional Context

Endpoint Description:

Existing endpoint: https://search.usa.gov/api/v2/results/i14y Suggested new endpoint: https://api.gsa.gov/technology/searchgov/v2/results/i14y

According to search.gov, we only need to change the endpoint. Under the hood, these are the same and have the same parameters / will have the same responses. The new suggested endpoint will allow the Search.gov team to increase the monitoring of VA usage and understand the nature of our API calls and responses in order to better support VA.gov workflow.

Search Monitor Description

Create an additional monitor for only Search Typeahead and remove it from the existing Search monitor. The tolerance level should be very tolerant (engineering/Jill decision here).

Engineering Info

Existing search engineering documentation: search docs.

There are 3 ruby modules involved with Search. Search, search typeahead, and search click tracking. We are only changing the 'search' module.

Implementation Tasks

Endpoint Tasks

[ ] Add a new feature toggle named 'search_use_v2_gsa'
[ ] Add a gsa_url value to the existing Search config. See below.
[ ] Override the Search::Configuration.base_path based on flipper value. See below.

Monitoring Tasks

Create a new monitor for Search Typeahead
Move the Search Typeahead from the existing Search monitor and to the new Search Typeahead monitor
Tolerance is set at TBD

Updated settings:

# Settings for search using api.gsa.gov
search:
  access_key: SEARCH_GOV_ACCESS_KEY
  affiliate: va
  mock_search: false
  gsa_url: https://api.gsa.gov/technology/searchgov/v2
  url: https://search.usa.gov/api/v2

Override Search base_path based on flipper.

def base_path
  if Flipper.enabled?(:search_use_v2_gsa, current_user)
    "#{Settings.search.gsa_url}/results/i14y"
  else
    "#{Settings.search.url}/search/i14y"
  end
end

Acceptance Criteria

[ ] Search.gov implementation on va.gov uses new endpoint, only when the feature toggle is turned on.
[ ] Tests have been updated with new URLs
[ ] Update documentation for on-site search as necessary
[ ] Review search monitoring
- [ ] update monitors to ensure they will still accurately report Search issues from our new endpoint
- [ ] Update search monitoring docs as necessary

Mightcouldprobablyshould be a new ticket:

[ ] If possible to do at the same time: break apart search monitoring for the v0Search Controller from monitoring v0 Search typeahead

randimays commented 1 month ago

There is an entry for the search endpoint in this vets-api config file.

dsasser commented 1 month ago

Engineering Pre-finement notes:

Search.gov base url's are defined in settings for search, search_typeahead, and search_click_tracking endpoints.

# Settings for search
search:
  access_key: SEARCH_GOV_ACCESS_KEY
  affiliate: va
  mock_search: false
  url: https://search.usa.gov/api/v2

Already using api.gsa.gov:

# Settings for search-typeahead
search_typeahead:
  api_key: API_GOV_ACCESS_KEY
  name: va
  url: https://api.gsa.gov/technology/searchgov/v1

  # Settings for search-click-tracking
search_click_tracking:
  access_key: SEARCH_GOV_ACCESS_KEY
  affiliate: va
  mock: false
  url: https://api.gsa.gov/technology/searchgov/v2

Endpoint defnition:

lib/search/configuration.rb:
  25      def base_path
  26:       "#{Settings.search.url}/search/i14y"
  27      end

Tests/Betamocks


spec/support/vcr_cassettes/search/503.yml:
  4      method: get
  5:     uri: https://search.usa.gov/api/v2/search/i14y?access_key=TESTKEY&affiliate=va&limit=10&offset=0&query=benefits
  6      body:

spec/support/vcr_cassettes/search/504.yml:
  4      method: get
  5:     uri: https://search.usa.gov/api/v2/search/i14y?access_key=TESTKEY&affiliate=va&limit=10&offset=0&query=benefits
  6      body:

spec/support/vcr_cassettes/search/empty_query.yml:
  4      method: get
  5:     uri: https://search.usa.gov/api/v2/search/i14y?access_key=TESTKEY&affiliate=va&limit=10&offset=0&query=
  6      body:

spec/support/vcr_cassettes/search/exceeds_rate_limit.yml:
  4      method: get
  5:     uri: https://search.usa.gov/api/v2/search/i14y?access_key=TESTKEY&affiliate=va&limit=10&offset=0&query=benefits
  6      body:

spec/support/vcr_cassettes/search/invalid_access_key.yml:
  4      method: get
  5:     uri: https://search.usa.gov/api/v2/search/i14y?access_key=INVALIDKEY&affiliate=va&limit=10&offset=0&query=benefits
  6      body:

spec/support/vcr_cassettes/search/invalid_affiliate.yml:
  4      method: get
  5:     uri: https://search.usa.gov/api/v2/search/i14y?access_key=TESTKEY&affiliate=INVALID&limit=10&offset=0&query=benefits
  6      body:

spec/support/vcr_cassettes/search/last_page.yml:
  4      method: get
  5:     uri: https://search.usa.gov/api/v2/search/i14y?access_key=TESTKEY&affiliate=va&limit=10&offset=80&query=benefits
  6      body:

spec/support/vcr_cassettes/search/page_1.yml:
  4      method: get
  5:     uri: https://search.usa.gov/api/v2/search/i14y?access_key=TESTKEY&affiliate=va&limit=10&offset=0&query=benefits
  6      body:

spec/support/vcr_cassettes/search/page_2.yml:
  4      method: get
  5:     uri: https://search.usa.gov/api/v2/search/i14y?access_key=TESTKEY&affiliate=va&limit=10&offset=10&query=benefits
  6      body:

spec/support/vcr_cassettes/search/success_utf8.yml:
  4      method: get
  5:     uri: https://search.usa.gov/api/v2/search/i14y?access_key=TESTKEY&affiliate=va&limit=10&offset=0&query=benefits
  6      body:

spec/support/vcr_cassettes/search/success.yml:
  4      method: get
  5:     uri: https://search.usa.gov/api/v2/search/i14y?access_key=TESTKEY&affiliate=va&limit=10&offset=0&query=benefits
  6      body:

spec/support/vcr_cassettes/search_click_tracking/missing_parameter.yml:
  4        method: post
  5:       uri: https://api.gsa.gov/technology/searchgov/v2/clicks/?access_key=TESTKEY&affiliate=va&module_code=I14Y&position=0&query=&url=https://www.testurl.com&user_agent=testUserAgent
  6        body:

spec/support/vcr_cassettes/search_click_tracking/success.yml:
  4        method: post
  5:       uri: https://api.gsa.gov/technology/searchgov/v2/clicks/?access_key=TESTKEY&affiliate=va&module_code=I14Y&position=0&query=testQuery&url=https://www.testurl.com&user_agent=testUserAgent
  6        body:

❓ Should we also change the typeahead endpoint to use the https://api.gsa.gov/technology/searchgov/v2 url?

jilladams commented 1 month ago

Great Q, I'll ask Jim.

jilladams commented 1 month ago

From Search.gov:

The typeahead endpoint is actually already using the regular base URL, and actually also a number of other features of the data.api.gov system that the other endpoints don't use. The typeahead endpoint is its own special beast that has not been prioritized for a while, and is definitely on our list for future performance improvements.

With respect to the 500s, we are also seeing that spike, and have put in some mitigations, but are still dealing with something hammering the API servers. We are also in the middle of an infrastructure migration, which, when complete, should give us much better monitoring tools for dealing with the errors.

jilladams commented 1 month ago

Settings changes are in vets-api. For testing: we don't know of a way to make this conditional in vets-api.

We cannot test this in a Tugboat. vets-api uses Review Instances, and we think we could test this in Review Instances.

We want to verify the 5 estimate once we can talk to @eselkin about how we might test in vets-api lower envs.

acrollet commented 1 month ago

For testing: we don't know of a way to make this conditional in vets-api.

Unsolicited suggestion: could you make the base_path return value dependent on a feature toggle?

dsasser commented 1 month ago

For testing: we don't know of a way to make this conditional in vets-api.

Unsolicited suggestion: could you make the base_path return value dependent on a feature toggle?

Yeah good point Adrian. We did discuss putting this behind a feature toggle, but weren't sure if we could/should do that in config settings, but as you pointed out we could actually do that in the configuration class. I'll pursue this as I think it is vital to be able to roll back easily and quickly should things not work as expected.

jilladams commented 3 weeks ago

A pontification on monitoring

Pertinent to this ticket

We need to make sure that after changes in this ticket, our API endpoint we're calling is monitored properly, and that our existing Search monitors that involve this endpoint make sense. That's a blocker to calling the update done, bc accurate monitoring can't lag too long behind the prod changes, since this is a P1 service. So I added that in an AC here. If we opt to break it out into a new ticket, it needs to happen in the same sprint or immediately following shipping the update. Sorry I missed this in refinement.

Bigger picture on Search

Our monitors are set up to monitor Search APM sort of globally. However, Search APM includes calls to 3 different endpoints: Screenshot 2024-08-14 at 2 09 49 PM

V0::SearchController#index: When calls to this endpoint fail, a site user cannot search. Failures to this endpoint matter a lot. Historically a quieter / more stable endpoint.
V0::SearchClickTrackingController#create - When calls to this endpoint fail, clicks don't get tracked by Search.gov. We don't care that much. Historically very noisy with errors.
V0::SearchTypeaheadController#index - When calls to this endpoint fail, a user doesn't receive back typeahead responses from Search.gov, but search can still be completed. It's a UX nicety, but it's not mission critical. Historically very noisy with errors.

Based on this conversation in the monitoring channel, if it's possible, we want to break up our monitors to be more clear about when SearchController proper is failing, vs. Typeahead noise.

Wouldn't it be nice to just handle all of that here. It's nice to want things, Jill, but: I'd be curious to hear from the person who picks up this ticket, whether that's reasonable or unreasonable to try to do that now or cut a follow up ticket.

jilladams commented 3 weeks ago

(Also: if the endpoint is set in internals, monitoring might not be affected at all. I dunno.)

jilladams commented 2 weeks ago

I went ahead and manually updated the Revision Date on this node, and set it to the value that is coming through in migration CSV, until we can work this ticket. Added a revision log. (My change may get overwritten by next migration, which is ok and may be a data point to help us understand what the migration is doing.)

jilladams commented 3 days ago

@SnowboardTechie given your Rails background, we thought this might be a good candidate for this sprint, after your other assigned tickets are closed. Let's talk through it when you're ready to start, for context.

department-of-veterans-affairs / va.gov-cms