department-of-veterans-affairs / va.gov-cms

Editor-centered management for Veteran-centered content.
https://prod.cms.va.gov
GNU General Public License v2.0
99 stars 69 forks source link

Content Audit: "address_2" in Sandy's DB #19379

Open Agile6MSkinner opened 1 month ago

Agile6MSkinner commented 1 month ago

Background

The Google directions link is failing to find directions for some facilities that have data in Line 2 of the address. The assumption is that this results from data passed in "address_2" that contain certain special characters (which ones is currently unknown). In order to test, scope and evaluate possible solutions, we need to know the volume of special characters in in "address_2", across facility types.

Slack Thread

Technical notes

Drupal concatenates Address 1 & 2 into a single field when data is migrated in from Facilities API. Likely we will need to query the Facilities API, to get a complete sense of the various address_2 values.

Per Lighthouse (thread with Adam here), Lighthouse does not exclude any special characters from either address_1 or address_2.

Some Google Directions links are based on:

While in theory it would save time to independently validate which of these characters causes the issue, there's also a case to be made that there could be others that are broken because of a different character being used. We can assume the address_2 field does not block any special characters, but Jill will reach out to LH to verify.

Acceptance Criteria

ian-sears commented 1 month ago

Adding https://va-gov.atlassian.net/browse/VAHELP-8087 for @mmiddaugh as a new, additional reference that just came in this afternoon as well. This CMS Jira Help Desk ticket is from Barbara Kuhn regarding the U.S. Virgin Islands Vet Center (USVI) Barb writes: "Quick question on the Mapbox location showing for USVI: Although the site isn’t yet published, I am seeing in Preview mode that the visual map display isnt what I see when I manually check the address in other search engines. " Barb checked VAST and there is no option for USVI, Just VI cc'ing @mmiddaugh for visibility

jilladams commented 1 month ago

Technical notes from the slack thread, to factor into next steps ticket

jilladams commented 1 month ago

Asked LH if they validate against any special chars in address_2 here: They do not.

mmiddaugh commented 1 month ago

Results of audit for VBAs

Several special characters were also found in address_1, plus 2 not seen in address_2: ' and ß

Special character Instances Example facility address_1 address_2
. 3 vba_329 477 Michigan Avenue Patrick V. McNamara Federal Building, 12th Floor
one or more , 72 vba_304h 600 Mt. Pleasant Avenue East Campus, Building 5 Lower Level
one or more - 27 vba_314o 700 24th Street Building 8204-2, Room 210
& 8 vba_317u 3000 NW 83rd St Building R, Room 110 - Veterans & Military Success Services
: 2 vba_311b Dr Hitzelberger Straße, 66849 Landstuhl REG-MED Center Unit:33100 Attn: Department of Veterans Affairs APO, AE 09180
/ 9 vba_318p 2843 Normandy Street Soldier Support Center/VA Outbase office
multiple vba_311 83-2 Dodu-ri, Paengseong-eup, Pyeongtaek-si, Gyeonggi-do, South Korea Veterans Affairs Office/BDD USAG Camp Humphreys Unit #15228 Attention: Building P-6400 APO, AP 96271-5228
jilladams commented 1 month ago

Amazing, thanks MIchelle!

Since Address_2 is used across all our facility types coming from LH, we will still need to do this audit for VAMC, Vet Center (and NCA? do we care about them yet?). In theory that's possible to do in the Drupal UI, but I still think a DB query will be the fastest way to get it done, which would be Drupal engineering work.

jilladams commented 1 month ago

Wait, I think I'm wrong, it's not possible to do in the Drupal UI, or in the Drupal DB. DaveP noted before: we smoosh address_1 and address_2 into a single Address field in Drupal, when we import. I think we have to ask LH to help with auditing address_2. @omahane does that sound right to you?

jilladams commented 1 month ago

(This ticket title is limited to Sandy's DB, but I don't think that was the spirit of trying to figure out special chars in address_2. I could be lost in the sauce.)

jilladams commented 1 month ago

Conversation with Adam re: address_2 in LH: https://dsva.slack.com/archives/C02BTJTDFTN/p1727970129537389

jilladams commented 1 month ago
jilladams commented 1 month ago

Sitewide can GET from LH API the contents of address_1 and address_2 (and address_3), but we can't do it in Drupal bc of the way the Drupal migration is smooshing the field data.

jilladams commented 1 month ago

Michelle noted she can also pull addresses out of the VAST report. In that case, will include only have VHA + Vet Centers.

Adam noted that for NCA: it depends if the cemetery is national or state.

jilladams commented 1 month ago

https://dsva.slack.com/archives/C06EK4NLGDQ/p1729545788290989

here's the list of VAST locations (sharepoint). It includes VA health and Vet Center facilities (plus mobiles) which are marked as Active or Temp. Deactivated - consistent with LH API.

jilladams commented 1 month ago

Things we need to figure out:

  1. Right now we pull in addresses 1, 2, and 3, and store them in Drupal in a variety of ways across content types. If we want to standardize, we will need to do migrations.
  2. If the goal is to get better address structure in our Google Maps directions links, can we do that without having to migrate / handle addresses differently? (is there a short pole?) To Christian's point: GIGO. If addresses 1 & 2 are used willy nilly, it will be hard to programmatically clean that up and send Google better data.

We need to do some smart thinking about how we might go about standardizing. This feels like a content modeling review, based on this audit. @davidmpickett does that sound right to you? If so, we could repurpose this ticket and refine it as content modeling, since the audit sort of happened async in the comments.

davidmpickett commented 1 month ago

@jilladams Content modeling seems incredibly premature here. That is a way of iterating on solution architecture. This feels like it is still early in Discovery. Here are some questions I would need answered before content modeling.

Address formatting and validation is also way more complex than phone number validation. Comparing this to the phone number standardization epic, that was a clearly defined subset of the phone number problem space: implement better guardrails in Drupal so that phone numbers are formatted consistently and can be fed in to the Design System component. We also already had a pattern in place that we were adopting, vs this where it’s not clear that formatting is even the issue

mmiddaugh commented 1 month ago

Summary of further testing observations

image.png


About attached Data file

jilladams commented 1 month ago

From refinement:

mmiddaugh commented 4 weeks ago

Clickthrough for "Get directions to Google maps" link in the month of September 2024