Closed oddball-lindsay closed 12 months ago
We definitely want an index for performance purposes If we use the PostGIS method ST_DWithin
, that can use the index for faster searching.
Adding:
column name | type |
---|---|
address_line_1 |
string |
address_line_2 |
string |
address_line_3 |
string |
address_type |
string |
city |
string |
country_code_iso3 |
string |
country_name |
string |
county_name |
string |
county_code |
string |
international_postal_code |
string |
province |
string |
state_code |
string |
zip_code |
string |
zip_code_suffix |
string |
lat |
float |
long |
float |
location |
st_point |
raw_address |
jsonb |
full_name |
string |
We don't need a boolean address_valid
field if we have raw_addres
and only save valid data in the other columns. This will also give us a way to check if address data from OGC has changed or not to avoid unnecessary address validation api calls for addresses we know are invalid while keeping our vets-api dataset clean.
This ticket will end up with multiple prs. I'm going to split the migrations into their own pr.
We are seeing that Address 1 may contain the business name, but address validator is able to take in "bad address data" and return usable results.
Need review on PR to obtain Address 2 information
Callback implemented so we can combine first/last name and only search for one field
Found a workaround (query) for PostGIS
Distant search and "fuzzy search" are separate, but can be chained together
EOD update: need to write specs for fuzzy search. The ticket should be ready for review after that.
Suggested using a magic number that gets set in the fuzzy search query; this will allow us to make the search more or less strict.
There is a default threshold of 0.3 in the algorithm, but because are doing a SQL query we are defining this in a string (the magic number) that we can control.
After discussing with @holdenhinkle, not sure if the settings.yml
is the correct place.
Looking at how the VA profile validates addresses, to ensure data has the right shape. Consider how this affects our implementation.
Below is the full veteran_representatives
table schema. Some of these attributes aren't currently in use, but have the potential to be if we can locate a data source.
create_table "veteran_representatives", id: false, force: :cascade do |t|
t.string "representative_id"
t.string "first_name"
t.string "last_name"
t.string "email"
t.string "phone"
t.datetime "created_at", null: false
t.datetime "updated_at", null: false
t.string "poa_codes", default: [], array: true
t.string "user_types", default: [], array: true
t.text "ssn_ciphertext"
t.text "dob_ciphertext"
t.text "encrypted_kms_key"
t.date "verified_decryptable_at"
t.string "middle_initial"
t.string "address_line_1"
t.string "address_line_2"
t.string "address_line_3"
t.string "address_type"
t.string "city"
t.string "country_code_iso3"
t.string "country_name"
t.string "county_name"
t.string "county_code"
t.string "international_postal_code"
t.string "province"
t.string "state_code"
t.string "zip_code"
t.string "zip_suffix"
t.float "lat"
t.float "long"
t.geography "location", limit: {:srid=>4326, :type=>"st_point", :geographic=>true}
t.jsonb "raw_address"
t.string "full_name"
t.index ["full_name"], name: "index_veteran_representatives_on_full_name"
t.index ["location"], name: "index_veteran_representatives_on_location", using: :gist
t.index ["representative_id", "first_name", "last_name"], name: "index_vso_grp", unique: true
t.check_constraint "representative_id IS NOT NULL", name: "veteran_representatives_representative_id_null"
end
vets-api is getting the following attributes coming from the current accreditation search:
We hope to use these attributes from vets-api along with the data from OGC. All of the address fields except raw_address
will ideally come from the Lighthouse Address Validation API so we can validate addresses before storing them directly in the database.
Waiting for tomorrow to validate the Full Name attribute (running okay locally, but not as expected -- 2a will be the actual update)
@jvcAdHoc here are the fields in the CorppDB accredited rep data file:
We also have a sample file from GCLAWS (OGC) which has these attributes for Attorneys/Claims Agents:
And these attributes for VSOs:
Issue Description
We need to add and manage address data for the representatives in our database. This includes modifications to the database table, addressing best practices, and ensuring data privacy.
Tasks
Acceptance Criteria