department-of-veterans-affairs / va.gov-team

Public resources for building on and in support of VA.gov. Visit complete Knowledge Hub:
https://depo-platform-documentation.scrollhelp.site/index.html
283 stars 204 forks source link

Research Postgres' Postgis extension as a solution for geo-based search #66522

Closed holdenhinkle closed 1 year ago

holdenhinkle commented 1 year ago

Issue Description

Explore the feasibility and advantages of using PostGIS within PostgreSQL for implementing geo-based search functionality in our application. The search should be based on either just a zip code or the combination of address 1, city, state, and zip. The goal is to assess how PostGIS can offer enhanced geospatial querying capabilities to enhance the user experience.


Tasks

Acceptance Criteria

holdenhinkle commented 1 year ago

Tables that use Postgis:

  create_table "base_facilities", id: false, force: :cascade do |t|
    t.string "unique_id", null: false
    t.string "name", null: false
    t.string "facility_type", null: false
    t.string "classification"
    t.string "website"
    t.float "lat", null: false
    t.float "long", null: false
    t.jsonb "address"
    t.jsonb "phone"
    t.jsonb "hours"
    t.jsonb "services"
    t.jsonb "feedback"
    t.jsonb "access"
    t.string "fingerprint"
    t.datetime "created_at", null: false
    t.datetime "updated_at", null: false
    t.geography "location", limit: {:srid=>4326, :type=>"st_point", :geographic=>true}
    t.boolean "mobile"
    t.string "active_status"
    t.string "visn"
    t.index ["lat"], name: "index_base_facilities_on_lat"
    t.index ["location"], name: "index_base_facilities_on_location", using: :gist
    t.index ["name"], name: "index_base_facilities_on_name", opclass: :gin_trgm_ops, using: :gin
    t.index ["unique_id", "facility_type"], name: "index_base_facilities_on_unique_id_and_facility_type", unique: true
  end
  create_table "drivetime_bands", force: :cascade do |t|
    t.string "name"
    t.string "unit"
    t.geography "polygon", limit: {:srid=>4326, :type=>"st_polygon", :geographic=>true}, null: false
    t.string "vha_facility_id", null: false
    t.datetime "created_at", null: false
    t.datetime "updated_at", null: false
    t.integer "min"
    t.integer "max"
    t.datetime "vssc_extract_date", default: "2001-01-01 00:00:00"
    t.index ["polygon"], name: "index_drivetime_bands_on_polygon", using: :gist
  end

Table Models

BaseFacility Model DrivetimeBand Model

holdenhinkle commented 1 year ago

What is the Postgres Postgis extension for?

PostGIS is an extension for the PostgreSQL relational database that transforms it into a spatial database, adding support for geographic objects and allowing location queries to be run in SQL. Essentially, it equips PostgreSQL with the capability to manage spatial data and perform geospatial operations, making it a competitive alternative to other spatial databases like Oracle Spatial and others.

Here's a brief overview of what PostGIS offers:

  1. Spatial Data Types: PostGIS adds new spatial data types (like POINT, LINESTRING, POLYGON, MULTIPOINT, etc.) to PostgreSQL, allowing you to store geographic and geometric objects efficiently.

  2. Spatial Indexes: PostGIS leverages the Generalized Search Tree (GiST) indexing system of PostgreSQL, making spatial queries faster and more efficient.

  3. Spatial Functions: It introduces a wide array of functions for spatial analysis. This includes functions for determining area, length, perimeter, and distance; for transforming coordinates between different coordinate reference systems; and for running various geometric tests and operations like intersection, union, difference, and containment checks.

  4. Geography vs. Geometry: PostGIS provides support for both geometry (planar) and geography (spherical) data types, allowing users to choose between them based on the specific use case.

  5. Raster Data Support: PostGIS also includes support for raster data operations, making it possible to run analyses that combine both vector and raster data.

  6. Interoperability: PostGIS supports a variety of standards from the Open Geospatial Consortium (OGC), like the Simple Features Specification. This ensures good interoperability with various GIS software and tools.

  7. Advanced Features: Beyond basic spatial functions, PostGIS also offers advanced features like 3D and 4D indexing, topology support, and more.

Applications of PostGIS range from geographic information systems (GIS), to geospatial data analytics, location-based services, logistics, and more. By using PostGIS, developers and organizations can tap into powerful spatial capabilities without leaving the familiar environment of PostgreSQL.

holdenhinkle commented 1 year ago

Geo-Search Spike

Here's a simple implementation of a zip-code based geo-search for Starbucks locations.

Step 1: Setup

Add the following gems to Gemfile:

gem 'pg'
gem 'activerecord-postgis-adapter'
gem 'rack-cors'

Step 2: Create the Migration

Create the migration for the Starbucks locations:

rails generate model StarbucksLocation name:string location:st_point

Update the migration file to specify the type of location as a geographic point and to add an index:

class CreateStarbucksLocations < ActiveRecord::Migration[6.0]
  def change
    create_table :starbucks_locations do |t|
      t.string :name
      t.st_point :location, geographic: true

      t.timestamps
    end
    add_index :starbucks_locations, :location, using: :gist
  end
end

Run the migration:

Step 3: Setup the Model

Update the model (app/models/starbucks_location.rb):

class StarbucksLocation < ApplicationRecord
  scope :nearby, -> (point) {
    order(Arel.sql("location <-> 'SRID=4326;POINT(#{point.longitude} #{point.latitude})'::geography"))
  }
end

Step 4: Setup the Controller

Create a new controller (app/controllers/starbucks_locations_controller.rb):

class StarbucksLocationsController < ApplicationController
  def index
    # Assuming a simple method to get lat/long from zip
    point = get_lat_long_from_zip(params[:zip])

    locations = StarbucksLocation.nearby(point).limit(25)

    render json: locations, status: :ok
  end

  private

  def get_lat_long_from_zip(zip)
    # TODO: Convert ZIP to lat/long. This can be done using various gems or APIs like Geocoder.
    # For simplicity, return a static point.
    Struct.new(:latitude, :longitude).new(34.0522, -118.2437)
  end
end

Step 5: Update routes.rb

Update routes (config/routes.rb):

Rails.application.routes.draw do
  resources :starbucks_locations, only: [:index]
end

Step 6: CORS (for access to the API from another domain)

In config/application.rb, add:

config.middleware.insert_before 0, Rack::Cors do
  allow do
    origins '*'
    resource '*', headers: :any, methods: [:get]
  end
end

Step 7: Populate the Database

Populate the database with Starbucks locations and their coordinates. This can be done with seed data or by integrating with an API that provides this information.

holdenhinkle commented 1 year ago

Geocoding services

Applications that require high accuracy and the ability to support high traffic without running into rate limits.

Here's a breakdown:

  1. Accuracy: Commercial services like Google Maps and Bing generally offer better accuracy and more up-to-date data. This is especially noticeable in constantly evolving urban areas or places with less-detailed open-source mapping contributions.

  2. Rate Limits: Free services, especially popular ones like OpenStreetMap's Nominatim, have strict usage policies to prevent abuse. If your application experiences a lot of traffic, you can quickly hit these rate limits. On the other hand, paid services typically allow for higher request volumes, and you can pay for more quota if needed.

  3. Features: Paid services often come with additional valuable features like batch geocoding, reverse geocoding, address normalization, route optimization, etc. These can be crucial for certain applications.

  4. Reliability: Commercial services might offer better uptime and performance guarantees, which can be essential for business-critical applications.

However, there are alternatives and strategies to consider:

  1. Self-hosted Solutions: If you expect a lot of traffic but are looking to avoid the costs associated with commercial geocoders, consider self-hosting a geocoding solution using data from OpenStreetMap. Tools like Pelias or Nominatim can be set up on your servers. This requires more initial setup and maintenance but can be cost-effective in the long run.

  2. Hybrid Approach: Use a free service for general tasks and fall back to a paid service for critical tasks or when higher accuracy is required.

  3. Caching: Implement caching mechanisms to reduce the number of requests to the geocoding service. This can help stay within free tier limits or reduce costs.

  4. Optimization: Only geocode when necessary. For instance, if you're storing locations, geocode them once when they're entered and then store the results.

In conclusion, while many high-traffic, accuracy-critical applications do end up using paid geocoding services, there are strategies and alternatives to consider based on your specific needs and budget. It's all about finding the right balance between accuracy, volume, and cost for your application.

holdenhinkle commented 1 year ago

We're using the following packages/services in vets-website:

https://www.mapbox.com/ - "mapbox-gl": "^1.12.0",

Jonathan confirmed that vets-website converts the search string into lat/long using mapbox.

holdenhinkle commented 1 year ago

From Jonathan VanCourt via Slack DM

Notes on postgis implementation:

Getting rep and org address data into vets-api database:

Incoming request from vets-website:

We can use various postgis methods to get search results

jvcAdHoc commented 1 year ago

Outstanding questions:

holdenhinkle commented 1 year ago

How do we handle an invalid address? => I'm guessing vets-website/mapbox handles that for us before the request is set to vets-api

How do we handle missing addresses? => We can run some queries against the data, like count the number of reps with no zip, just to get a sense if this is an edge case we'll have to account for. => Actually, I'm guessing that reps with a missing address won't have lat/long values so they'll be automatically excluded from the query.

how are reps/orgs without addresses searchable? => I can't see how they could be searchable.

What is our preferred/default search? Everything within a 50 mile radius? => Yes

Use the bounding box of the map component? => I'm not familiar with this.