DigitalCommons / property-boundaries-service

A service to download updated versions of the land registry property boundaries and serve with company information.
0 stars 0 forks source link

[Backsearch P1] Implement the matching algorithm #12

Open lin-d-hop opened 4 months ago

lin-d-hop commented 4 months ago

Description

After great research and investigations outlined in this spike, let's move forward implementing the matching algorithm to the level of 99.9% accuracy :raised_hands:

Acceptance Criteria

  1. Implement the matching algorithm with appropriate error handling
  2. Unit test coverage
rogup commented 2 months ago

The first step of this is done and we're wrapping up the work for now.

These changes https://github.com/DigitalCommons/property-boundaries-service/pull/15 are ready to be deployed to production.

And this further change is deployed on staging https://github.com/DigitalCommons/property-boundaries-service/commit/c2f70a068ea109990eca9919953989c1166a90ca

So the state of the pipeline is:

  1. A pipeline on prod-2, which:
    • updates company ownership data, publishes to all LX users
    • downloads the latest raw INSPIRE data and backs it up to the Hetzner storage box
    • transforms the raw data and inserts it into the pending_inspire_polygons table, which can be viewed by LX super users only
  2. A pipeline on staging-2, which:
    • updates company ownership data, publishes to all LX users on staging
    • downloads, transforms, and analyses the INSPIRE data and publishes those that successfully match for all LX users on staging. This is a simplified version of the pipeline that skips the segmentation/merge matching so make it faster. But it still takes ~24hrs and not sure how reliable it will be.