DigitalCommons / land-explorer-front-end

React app for the Land Explorer front end
http://landexplorer.cc
GNU Affero General Public License v3.0
2 stars 0 forks source link

Pick up updates to Land Reg/INSPIRE dataset #47

Closed lin-d-hop closed 11 months ago

lin-d-hop commented 2 years ago

Description

Land Reg INSPIRE data set is updated weekly, however we are not picking up those updates. We need a regular weekly job that updates our database.

Acceptance Criteria

Implementation Tasks

From a meeting with Tom and John 24/11. Notes here.

Update from meeting with Nick, John and Marcel 3/8/23

King-Mob commented 2 years ago

@ThomasDavisonsGit and I have had a discussion about whether he can start this work with support from me and @wu-lee . We had a call yesterday where I delivered a verbal specification and described the current state of affairs.

He's been actioned to create a document based on the call, so we have a written spec and to contact Hansen to understand how the first version of the boundary service (essentially a large database) was created.

We're going to have a conversation after the documentation is up and before initialising a codebase. The two fixed points for the codebase are that it will be in typescript and have unit tests.

Thomas will report in on element in the datafactory channel.

ThomasDavisonsGit commented 2 years ago

============== Data Factory Written Spec

Task: Land Registry update their database monthly (or weekly?). We need to grab these updates automatically, currently the process is manual.

INSPIRE dataset (Land Registry database):

  1. https://use-land-property-data.service.gov.uk/datasets/inspire/download
  2. They have 1 GML file per local authority, e.g: https://use-land-property-data.service.gov.uk/datasets/inspire/download/Abertawe_-_Swansea.zip (we have one SQL file for all local authorities)

Boundary Service (our database):

  1. Created by Hansen.
  2. To access, download my SQL workbench then enter DC credentials.

Code requirements:

  1. Code to be in Typescript.
  2. Must find and extract data from all zip files containing the gml data (could use QGIS).
  3. Covert data from GML to SQL.
  4. Update our Boundary Service with new data. SQLize can do this (SQLize already in use in backend of Land Explorer).
  5. Unit tests needed.
  6. Our database currently hosted on Cloudwave, we would like to host it on Hetzner instead.

Next steps for development:

  1. Scrawl through the Land Registry html web page and find the GML links.
  2. Document what Hansen did, what he did to manually do this.
  3. Figure out what unit tests we need.
lin-d-hop commented 2 years ago

Thanks for this @ThomasDavisonsGit (sorry @rogup) These next steps look like they will create future issues. Let's make sure we create new issues for the next steps before you start working on them. So keep the spec to literally the 3 steps above.

lin-d-hop commented 2 years ago

The goal is to get to a place when you are ready to pair with John on writing this in typescript.

lin-d-hop commented 2 years ago

Hi @ThomasDavisonsGit Have you made any progress with this? Any updates?:)

lin-d-hop commented 2 years ago

Hi @ThomasDavisonsGit Have you made any progress with this? Any updates?:)

ThomasDavisonsGit commented 2 years ago

Current manual process for getting INSPIRE dataset into our MySQL database, from Hansen (his memories are dusty):

  1. Download the many inspire datasets manually (or the .gml files can be looped through using the download list on the INSPIRE website).

  2. Convert .gml to geojson (this is the tricky part and the bit that needs automation***). Can be done manually via QGIS (google "qgis save as geojson") but is very slow, files need to be imported and converted 1 file at a time. There might have been a command line tool for this (dusty memories).

  3. Reformat geojson and insert to the MySQL database. Most of the application logic can be found in Hansen’s private repo, written in Laravel (PHP). I have access to this now: https://github.com/hansensalim/landex-inspire-importer/blob/main/app/Services/CoreService.php

Fetching data from the MySQL database is done with backend code, its results are returned to the front end which John Evans is developing. Repo: https://github.com/DigitalCommons/land-explorer-back-end/blob/8532b2c5d90637991a1553a67a9390a85c488a1e/src/queries/query.ts#L140-L212

***MySQL “polygon” column contains a Spatial Data Type (either POLYGON or MULTIPOLYGON). Hansen could not find a library to convert this from .gml to MySQL, but there were libraries for GeoJSON to MySQL, hence the whole process being GML to GeoJSON to MySQL. However there are many problems with converting properly: (https://gis.stackexchange.com/questions/28613/convert-gml-to-geojson)

lin-d-hop commented 1 year ago

Hi @ThomasDavisonsGit Thanks for the work documenting the steps you plan to take for this task. How are you getting on with the steps?

lin-d-hop commented 1 year ago

Notes from meetings 26/01

ThomasDavisonsGit commented 1 year ago

Update on the 5 steps I am aiming to complete:

  1. Download a zip - have successfully downloaded an example zip file, however the INSPIRE zip file downloads as 1 byte (no error message).
  2. Unzip the zip - done.
  3. Convert gml to geojson - done. The conversion seems to conserve all info, this needs to be tested.
  4. Reformat geojson data, create MySQL table with it, insert into our database -
    • Hansen's php script does this. If this php code was to be used it would need modifying as some parts are superfluous for this task.
    • My Typescript code has progressed to printing individual geojson properties to terminal, now need to load them into MySQL database. Local connection is authenticated, but getting a subsequent connection error when loading in the geojson data.
  5. Do all this for batch of multiple zips - using .forEach on an array of URLs.
lin-d-hop commented 1 year ago

Repo with Tom's investigations: https://github.com/DigitalCommons/inspire_updater

lin-d-hop commented 1 year ago

@ThomasDavisonsGit Can we also share a link to Hansen's code too please?

rogup commented 1 year ago

@King-Mob Do you know where we were planning to run this app to fetch INSPIRE data? I feel like it makes most sense to run it on the same server as the 'boundary service' which is currently just a MySql database, not really a service.

I've been talking to @wu-lee and if we do this, makes sense for us to use Ansible and/or Docker, so that this app can be redeployed and maintained more easily.

There's already an Ansible playbook on the Mykomaps server so @wu-lee is suggesting we move it there, as well as the LX servers eventually, so everything is in the same place.

King-Mob commented 1 year ago

@lin-d-hop regarding the matching task, it might be we need to do some geocoding from the address of the UK company land data back to the polygons. this has an associated cost to it.

I can't give you firm proportions of how many we might want to do this for, vs what we can get with the data we already have, but I should be able to in a couple of weeks (I'm not here next week)

lin-d-hop commented 1 year ago

Am I right that there is an option to just download the changes on the Land Reg ownership data? That will keep costs down for future updates and allow us to stay more up to date. I guess in this update we can also just do the geocoding for any polys with changed ownership compared to what we have in the DB.

lin-d-hop commented 1 year ago

John and I broke the remainder of this work this into three parts:

rogup commented 11 months ago

Hi @King-Mob is the status of the tickboxes in this issue (in the description and latest comment) up to date? Would help to understand the codebase when I look through it

King-Mob commented 11 months ago

@rogup disregard check boxes, mostly done

lin-d-hop commented 11 months ago

Notes from Lynne and John last chat Nov 21st: https://docs.google.com/document/d/1ZkmwqwodVZPo9RyzZy5HIhQf8KuPa5Z4W6Rv7kO1nl0/edit

lin-d-hop commented 11 months ago

Closing this issue to clear out the muddy terrain. Dredging if you will.