datasette / datasette-enrichments-opencage

Geocoding and reverse geocoding using OpenCage
Apache License 2.0
3 stars 0 forks source link

Plugin design #1

Open simonw opened 9 months ago

simonw commented 9 months ago

Geocoding and reverse geocoding using OpenCage

Will use their API directly via HTTPX. https://opencagedata.com/api

simonw commented 9 months ago

This API is actually very verbose - it returns a lot of neat data. Here's an example response for

260 Capistrano Rd, Half Moon Bay, CA 94019

I've truncated it to just one of the three results:

{
  "documentation": "https://opencagedata.com/api",
  "licenses": [
    {
      "name": "see attribution guide",
      "url": "https://opencagedata.com/credits"
    }
  ],
  "rate": {
    "limit": 2500,
    "remaining": 2497,
    "reset": 1700956800
  },
  "results": [
    {
      "annotations": {
        "DMS": {
          "lat": "37\u00b0 30' 15.32736'' N",
          "lng": "122\u00b0 28' 57.93168'' W"
        },
        "FIPS": {
          "county": "06081",
          "state": "06"
        },
        "MGRS": "10SEG4571650939",
        "Maidenhead": "CM87sm21ba",
        "Mercator": {
          "x": -13634718.341,
          "y": 4483612.069
        },
        "OSM": {
          "edit_url": "https://www.openstreetmap.org/edit?node=11050232837#map=17/37.50426/-122.48276",
          "note_url": "https://www.openstreetmap.org/note/new#map=17/37.50426/-122.48276&layers=N",
          "url": "https://www.openstreetmap.org/?mlat=37.50426&mlon=-122.48276#map=17/37.50426/-122.48276"
        },
        "UN_M49": {
          "regions": {
            "AMERICAS": "019",
            "NORTHERN_AMERICA": "021",
            "US": "840",
            "WORLD": "001"
          },
          "statistical_groupings": [
            "MEDC"
          ]
        },
        "callingcode": 1,
        "currency": {
          "alternate_symbols": [
            "US$"
          ],
          "decimal_mark": ".",
          "disambiguate_symbol": "US$",
          "html_entity": "$",
          "iso_code": "USD",
          "iso_numeric": "840",
          "name": "United States Dollar",
          "smallest_denomination": 1,
          "subunit": "Cent",
          "subunit_to_unit": 100,
          "symbol": "$",
          "symbol_first": 1,
          "thousands_separator": ","
        },
        "flag": "\ud83c\uddfa\ud83c\uddf8",
        "geohash": "9q8vkg6y0u7dbv48q5uc",
        "qibla": 18.83,
        "roadinfo": {
          "drive_on": "right",
          "road": "Capistrano Road",
          "speed_in": "mph"
        },
        "sun": {
          "rise": {
            "apparent": 1700924460,
            "astronomical": 1700918940,
            "civil": 1700922780,
            "nautical": 1700920860
          },
          "set": {
            "apparent": 1700873580,
            "astronomical": 1700879040,
            "civil": 1700875260,
            "nautical": 1700877180
          }
        },
        "timezone": {
          "name": "America/Los_Angeles",
          "now_in_dst": 0,
          "offset_sec": -28800,
          "offset_string": "-0800",
          "short_name": "PST"
        },
        "what3words": {
          "words": "probed.sticky.provide"
        }
      },
      "bounds": {
        "northeast": {
          "lat": 37.5043076,
          "lng": -122.4827088
        },
        "southwest": {
          "lat": 37.5042076,
          "lng": -122.4828088
        }
      },
      "components": {
        "ISO_3166-1_alpha-2": "US",
        "ISO_3166-1_alpha-3": "USA",
        "ISO_3166-2": [
          "US-CA"
        ],
        "_category": "commerce",
        "_type": "restaurant",
        "continent": "North America",
        "country": "United States",
        "country_code": "us",
        "county": "San Mateo County",
        "hamlet": "Princeton-by-the-Sea",
        "house_number": "260",
        "postcode": "94019",
        "restaurant": "La Costanera",
        "road": "Capistrano Road",
        "state": "California",
        "state_code": "CA",
        "town": "Half Moon Bay"
      },
      "confidence": 9,
      "formatted": "La Costanera, 260 Capistrano Road, Princeton-by-the-Sea, Half Moon Bay, CA 94019, United States of America",
      "geometry": {
        "lat": 37.5042576,
        "lng": -122.4827588
      }
    }
  ],
  "status": {
    "code": 200,
    "message": "OK"
  },
  "stay_informed": {
    "blog": "https://blog.opencagedata.com",
    "mastodon": "https://en.osm.town/@opencage"
  },
  "thanks": "For using an OpenCage API",
  "timestamp": {
    "created_http": "Sat, 25 Nov 2023 18:23:53 GMT",
    "created_unix": 1700936633
  },
  "total_results": 3
}
simonw commented 9 months ago

The most instantly useful fields are:

      "formatted": "La Costanera, 260 Capistrano Road, Princeton-by-the-Sea, Half Moon Bay, CA 94019, United States of America",
      "geometry": {
        "lat": 37.5042576,
        "lng": -122.4827588
      }

But there's a lot of value in the other fields in the annotations block too.

The flag is cute, it's the Emoji 🇺🇸

simonw commented 9 months ago

Could pass ?limit=1 by default since we don't plan to use anything but the first result.

simonw commented 9 months ago

Rate limiting is going to be important.

Free trial accounts are limited to one request per second, and if you exceed that rate you may see a 429 - Too many requests response.

I'll be paying for Datasette Cloud, but people trying out the plugin may still start with the free trial.

simonw commented 9 months ago

Docs for annotations: https://opencagedata.com/api#annotations

Annotations can be turned off by setting the optional no_annotations parameter (with the exception of roadinfo and UN/LOCODE, please see below for details), and we recommend you do so if you don't need this information as it means we can respond to your query a tiny bit more quickly.

simonw commented 9 months ago

FIPS lookup is really useful!

Contains the US Federal Information Processing Standards (FIPS) code for the state (two digit) and county (five digit) of the center point of the result, if we can determine it.

Example: { "county": "08101", "state": "08" }

Note:

  • Only for locations in the United States and associated territories.
  • The values are strings - not numbers - and can have leading zeros.
simonw commented 9 months ago

Worth considering caching for this. Maybe cache the most recent 1,000 lookups? Would be good to avoid situations where a dataset with the same address 100+ times makes 100 unnecessary calls.

simonw commented 9 months ago

For the UI: I want users to be able to select multiple parts of that big bunch of JSON to be assigned to multiple columns in their table.

I messed around with the FieldList() mechanism in WTForms and got to this:

CleanShot 2023-11-25 at 16 38 19@2x

The markup for that is pretty nasty and mobile-unfriendly: it's a bunch of tables:

CleanShot 2023-11-25 at 16 39 59@2x

I also couldn't figure out a neat idiom for getting it to add extra form fields if they ran out.

Here's the code for that prototype:

from datasette import hookimpl
from datasette.database import Database
from datasette_enrichments import Enrichment
from markupsafe import Markup
from wtforms import Form, StringField, FieldList, FormField, BooleanField
from wtforms.validators import ValidationError
from wtforms.widgets import TextInput

fields = [
    "formatted",
    "geometry.lat",
    "geometry.lng",
    "annotations.DMS.lat",
    "annotations.DMS.lng",
    "annotations.FIPS.county",
    "annotations.FIPS.state",
    "annotations.MGRS",
    "annotations.Maidenhead",
    "annotations.Mercator.x",
    "annotations.Mercator.y",
    "annotations.OSM.edit_url",
    "annotations.OSM.note_url",
    "annotations.OSM.url",
    "annotations.UN_M49.regions.AMERICAS",
    "annotations.UN_M49.regions.NORTHERN_AMERICA",
    "annotations.UN_M49.regions.US",
    "annotations.UN_M49.regions.WORLD",
    "annotations.UN_M49.statistical_groupings",
    "annotations.callingcode",
    "annotations.currency.alternate_symbols",
    "annotations.currency.decimal_mark",
    "annotations.currency.disambiguate_symbol",
    "annotations.currency.html_entity",
    "annotations.currency.iso_code",
    "annotations.currency.iso_numeric",
    "annotations.currency.name",
    "annotations.currency.smallest_denomination",
    "annotations.currency.subunit",
    "annotations.currency.subunit_to_unit",
    "annotations.currency.symbol",
    "annotations.currency.symbol_first",
    "annotations.currency.thousands_separator",
    "annotations.flag",
    "annotations.geohash",
    "annotations.qibla",
    "annotations.roadinfo.drive_on",
    "annotations.roadinfo.road",
    "annotations.roadinfo.speed_in",
    "annotations.sun.rise.apparent",
    "annotations.sun.rise.astronomical",
    "annotations.sun.rise.civil",
    "annotations.sun.rise.nautical",
    "annotations.sun.set.apparent",
    "annotations.sun.set.astronomical",
    "annotations.sun.set.civil",
    "annotations.sun.set.nautical",
    "annotations.timezone.name",
    "annotations.timezone.now_in_dst",
    "annotations.timezone.offset_sec",
    "annotations.timezone.offset_string",
    "annotations.timezone.short_name",
    "annotations.what3words.words",
    "bounds.northeast.lat",
    "bounds.northeast.lng",
    "bounds.southwest.lat",
    "bounds.southwest.lng",
    "components.ISO_3166-1_alpha-2",
    "components.ISO_3166-1_alpha-3",
    "components.ISO_3166-2",
    "components._category",
    "components._type",
    "components.continent",
    "components.country",
    "components.country_code",
    "components.county",
    "components.hamlet",
    "components.house_number",
    "components.postcode",
    "components.restaurant",
    "components.road",
    "components.state",
    "components.state_code",
    "components.town",
    "confidence",
]

class DataListWidget(TextInput):
    def __init__(self, data_list, **kwargs):
        super().__init__(**kwargs)
        self.data_list = data_list

    def __call__(self, field, **kwargs):
        kwargs.setdefault("id", field.id)
        kwargs["list"] = self.data_list_id(field.id)
        html = str(super().__call__(field, **kwargs))
        html += '<datalist id="%s">' % (self.data_list_id(field.id),)
        for item in self.data_list:
            html += '<option value="%s">' % item
        html += "</datalist>"
        return Markup(html)

    def data_list_id(self, id):
        return id + "-datalist"

@hookimpl
def register_enrichments(datasette):
    config = datasette.plugin_config("datasette-enrichments-opencage") or {}
    api_key = config.get("api_key")
    api_key = "abc"
    if api_key:
        return [OpenCageEnrichment(api_key=api_key)]

class DataListField(Form):
    column = StringField("Column", render_kw=dict(style='width: 80%'))
    component = StringField(
        "Component", widget=DataListWidget(data_list=fields)
    )

class OpenCageForm(Form):
    data_fields = FieldList(FormField(DataListField, label=""), label="Components to store", min_entries=3)

class OpenCageEnrichment(Enrichment):
    name = "OpenCage geocoder"
    slug = "opencage"
    description = "Geocoding and reverse geocoding using OpenCage"

    def __init__(self, api_key):
        self.api_key = api_key

    async def get_config_form(self, db: Database, table: str):
        return OpenCageForm

    async def initialize(self, datasette, db, table: str, config: dict):
        pass

    async def enrich_batch(self):
        pass
simonw commented 9 months ago

I'm going to say you can assign up to 10 columns. If you want more than that run the enrichment multiple times I guess? Or store the full JSON and then extract bits of it using json_extract() later on.

simonw commented 9 months ago

I'm scoping this WAY down for the initial release:

mroswell commented 7 months ago

I came here to add a new issue, but found this. I want a comma-delimited field, with a popup telling me all the options for returning fields. For me, it is VERY important to have confidence codes, and given:

It would be helpful to have confidence code returned as a third field by default. No need to offer the user the opportunity to name the field: just call it confidence or confidence_oc if that's already taken. And change the placeholder text to: Leave this blank if you only want to store latitude/longitude/confidence

Thanks for the sample formatted responses above. It was easier to find this here, than to find it in the opencage API documentation. (The "thanks" key is cute :). And La Costanero--the restaurant in one of those sample API responses--looks wonderful! )

(All of this sparks one more idea for a plugin. Mouseover a json field key and have a way of generating the JSON_EXTRACT code for a given key, and paste it into the clipboard. Perhaps this could be a browser plugin. Ctrl-Click on the json key to get 6 sample entries copied to the keyboard..... I'm full of ideas!)