department-of-veterans-affairs / va.gov-team

Public resources for building on and in support of VA.gov. Visit complete Knowledge Hub:
https://depo-platform-documentation.scrollhelp.site/index.html
282 stars 203 forks source link

Migrations/Model/Data for Representatives #66845

Closed oddball-lindsay closed 12 months ago

oddball-lindsay commented 1 year ago

Issue Description

We need to add and manage address data for the representatives in our database. This includes modifications to the database table, addressing best practices, and ensuring data privacy.


Tasks


Acceptance Criteria

jvcAdHoc commented 1 year ago

We definitely want an index for performance purposes If we use the PostGIS method ST_DWithin, that can use the index for faster searching.

jvcAdHoc commented 1 year ago

Adding:

column name type
address_line_1 string
address_line_2 string
address_line_3 string
address_type string
city string
country_code_iso3 string
country_name string
county_name string
county_code string
international_postal_code string
province string
state_code string
zip_code string
zip_code_suffix string
lat float
long float
location st_point
raw_address jsonb
full_name string

We don't need a boolean address_valid field if we have raw_addres and only save valid data in the other columns. This will also give us a way to check if address data from OGC has changed or not to avoid unnecessary address validation api calls for addresses we know are invalid while keeping our vets-api dataset clean.

jvcAdHoc commented 1 year ago

This ticket will end up with multiple prs. I'm going to split the migrations into their own pr.

oddball-lindsay commented 1 year ago

We are seeing that Address 1 may contain the business name, but address validator is able to take in "bad address data" and return usable results.

Need review on PR to obtain Address 2 information

oddball-lindsay commented 1 year ago

Callback implemented so we can combine first/last name and only search for one field

Found a workaround (query) for PostGIS

Distant search and "fuzzy search" are separate, but can be chained together

jvcAdHoc commented 1 year ago

EOD update: need to write specs for fuzzy search. The ticket should be ready for review after that.

oddball-lindsay commented 1 year ago

Suggested using a magic number that gets set in the fuzzy search query; this will allow us to make the search more or less strict.

There is a default threshold of 0.3 in the algorithm, but because are doing a SQL query we are defining this in a string (the magic number) that we can control.

After discussing with @holdenhinkle, not sure if the settings.yml is the correct place.

oddball-lindsay commented 1 year ago

Looking at how the VA profile validates addresses, to ensure data has the right shape. Consider how this affects our implementation.

oddball-lindsay commented 1 year ago

Below is the full veteran_representatives table schema. Some of these attributes aren't currently in use, but have the potential to be if we can locate a data source.

  create_table "veteran_representatives", id: false, force: :cascade do |t|
    t.string "representative_id"
    t.string "first_name"
    t.string "last_name"
    t.string "email"
    t.string "phone"
    t.datetime "created_at", null: false
    t.datetime "updated_at", null: false
    t.string "poa_codes", default: [], array: true
    t.string "user_types", default: [], array: true
    t.text "ssn_ciphertext"
    t.text "dob_ciphertext"
    t.text "encrypted_kms_key"
    t.date "verified_decryptable_at"
    t.string "middle_initial"
    t.string "address_line_1"
    t.string "address_line_2"
    t.string "address_line_3"
    t.string "address_type"
    t.string "city"
    t.string "country_code_iso3"
    t.string "country_name"
    t.string "county_name"
    t.string "county_code"
    t.string "international_postal_code"
    t.string "province"
    t.string "state_code"
    t.string "zip_code"
    t.string "zip_suffix"
    t.float "lat"
    t.float "long"
    t.geography "location", limit: {:srid=>4326, :type=>"st_point", :geographic=>true}
    t.jsonb "raw_address"
    t.string "full_name"
    t.index ["full_name"], name: "index_veteran_representatives_on_full_name"
    t.index ["location"], name: "index_veteran_representatives_on_location", using: :gist
    t.index ["representative_id", "first_name", "last_name"], name: "index_vso_grp", unique: true
    t.check_constraint "representative_id IS NOT NULL", name: "veteran_representatives_representative_id_null"
  end

vets-api is getting the following attributes coming from the current accreditation search:

We hope to use these attributes from vets-api along with the data from OGC. All of the address fields except raw_address will ideally come from the Lighthouse Address Validation API so we can validate addresses before storing them directly in the database.

oddball-lindsay commented 1 year ago

Waiting for tomorrow to validate the Full Name attribute (running okay locally, but not as expected -- 2a will be the actual update)

oddball-lindsay commented 1 year ago

@jvcAdHoc here are the fields in the CorppDB accredited rep data file:

  1. PTCPNT_ID
  2. LEGACY_POA_CD
  3. ADDRS_ONE_TXT
  4. ADDRS_TWO_TXT
  5. ADDRS_THREE_TXT
  6. ZIP_PREFIX_NBR
  7. CITY_NM
  8. POSTAL_CD
  9. EMAIL_ADDRS_TXT
  10. NM
  11. ORG_TYPE_NM
  12. PHONE_NUMBER

We also have a sample file from GCLAWS (OGC) which has these attributes for Attorneys/Claims Agents:

  1. AccrAttorneyId / AccrClaimAgentId
  2. Number
  3. POA
  4. LastName
  5. FirstName
  6. MiddleName
  7. SirNameTitle
  8. OrganizationName (only for Attorneys and mostly NULL)
  9. DateOrigAccredited
  10. DateRecertified
  11. DateRecertDue
  12. DateAccredCancelled
  13. WorkAddress1
  14. WorkAddress2
  15. WorkAddress3
  16. WorkCity
  17. WorkState
  18. WorkZip
  19. FaxNumber (lots of NULL)
  20. WorkNumber (lots of NULL)
  21. PersonID

And these attributes for VSOs:

  1. VSOID
  2. Number
  3. POA
  4. CertifyingOfficialID
  5. RecogtnitionDate
  6. CertifyingOfficial.LastName
  7. CertifyingOfficial.FirstName
  8. CertifyingOfficial.MiddleName
  9. Organization.Name
  10. FaxNumber (lots of NULL)
  11. WorkNumber (lots of NULL)
  12. AcceptsElectronicPOAs (0 or 1 value)