Closed lin-d-hop closed 11 months ago
@ThomasDavisonsGit and I have had a discussion about whether he can start this work with support from me and @wu-lee . We had a call yesterday where I delivered a verbal specification and described the current state of affairs.
He's been actioned to create a document based on the call, so we have a written spec and to contact Hansen to understand how the first version of the boundary service (essentially a large database) was created.
We're going to have a conversation after the documentation is up and before initialising a codebase. The two fixed points for the codebase are that it will be in typescript and have unit tests.
Thomas will report in on element in the datafactory channel.
Task: Land Registry update their database monthly (or weekly?). We need to grab these updates automatically, currently the process is manual.
INSPIRE dataset (Land Registry database):
Boundary Service (our database):
Code requirements:
Next steps for development:
Thanks for this @ThomasDavisonsGit (sorry @rogup) These next steps look like they will create future issues. Let's make sure we create new issues for the next steps before you start working on them. So keep the spec to literally the 3 steps above.
The goal is to get to a place when you are ready to pair with John on writing this in typescript.
Hi @ThomasDavisonsGit Have you made any progress with this? Any updates?:)
Hi @ThomasDavisonsGit Have you made any progress with this? Any updates?:)
Current manual process for getting INSPIRE dataset into our MySQL database, from Hansen (his memories are dusty):
Download the many inspire datasets manually (or the .gml files can be looped through using the download list on the INSPIRE website).
Convert .gml to geojson (this is the tricky part and the bit that needs automation***). Can be done manually via QGIS (google "qgis save as geojson") but is very slow, files need to be imported and converted 1 file at a time. There might have been a command line tool for this (dusty memories).
Reformat geojson and insert to the MySQL database. Most of the application logic can be found in Hansen’s private repo, written in Laravel (PHP). I have access to this now: https://github.com/hansensalim/landex-inspire-importer/blob/main/app/Services/CoreService.php
Fetching data from the MySQL database is done with backend code, its results are returned to the front end which John Evans is developing. Repo: https://github.com/DigitalCommons/land-explorer-back-end/blob/8532b2c5d90637991a1553a67a9390a85c488a1e/src/queries/query.ts#L140-L212
***MySQL “polygon” column contains a Spatial Data Type (either POLYGON or MULTIPOLYGON). Hansen could not find a library to convert this from .gml to MySQL, but there were libraries for GeoJSON to MySQL, hence the whole process being GML to GeoJSON to MySQL. However there are many problems with converting properly: (https://gis.stackexchange.com/questions/28613/convert-gml-to-geojson)
Hi @ThomasDavisonsGit Thanks for the work documenting the steps you plan to take for this task. How are you getting on with the steps?
Notes from meetings 26/01
Update on the 5 steps I am aiming to complete:
Repo with Tom's investigations: https://github.com/DigitalCommons/inspire_updater
@ThomasDavisonsGit Can we also share a link to Hansen's code too please?
@King-Mob Do you know where we were planning to run this app to fetch INSPIRE data? I feel like it makes most sense to run it on the same server as the 'boundary service' which is currently just a MySql database, not really a service.
I've been talking to @wu-lee and if we do this, makes sense for us to use Ansible and/or Docker, so that this app can be redeployed and maintained more easily.
There's already an Ansible playbook on the Mykomaps server so @wu-lee is suggesting we move it there, as well as the LX servers eventually, so everything is in the same place.
@lin-d-hop regarding the matching task, it might be we need to do some geocoding from the address of the UK company land data back to the polygons. this has an associated cost to it.
I can't give you firm proportions of how many we might want to do this for, vs what we can get with the data we already have, but I should be able to in a couple of weeks (I'm not here next week)
Am I right that there is an option to just download the changes on the Land Reg ownership data? That will keep costs down for future updates and allow us to stay more up to date. I guess in this update we can also just do the geocoding for any polys with changed ownership compared to what we have in the DB.
John and I broke the remainder of this work this into three parts:
[x] 1. Get the typescript roll out live (staging)
[ ] 2. INSPIRE updates - We have some more thinking to do wrt patching the existing land reg data.
[ ] 3. Companies house updates
Hi @King-Mob is the status of the tickboxes in this issue (in the description and latest comment) up to date? Would help to understand the codebase when I look through it
@rogup disregard check boxes, mostly done
Notes from Lynne and John last chat Nov 21st: https://docs.google.com/document/d/1ZkmwqwodVZPo9RyzZy5HIhQf8KuPa5Z4W6Rv7kO1nl0/edit
Closing this issue to clear out the muddy terrain. Dredging if you will.
Description
Land Reg INSPIRE data set is updated weekly, however we are not picking up those updates. We need a regular weekly job that updates our database.
Acceptance Criteria
Implementation Tasks
From a meeting with Tom and John 24/11. Notes here.
Update from meeting with Nick, John and Marcel 3/8/23
[ ] Changing Deployment System. Addition to the existing playbook that provisions servers for digital commons, that installs necessary tools for a service running at boundaryservice.landexplorer.app. Est. 2 days
[ ] Merging MykoMap and LX servers so that there’s a dev and a staging server for both, and they’re deployed in the same way. Requires itemisation, don’t know estimate.
[ ] Migrating existing data (more than 20GB, less than 30GB). Est 2 days
[ ] Job that triggers updates. est. 1 day
[x] Fetch data from INSPIRE Est. 2 days - the data is broken up by council and not accessed in an easy to use api format
[ ] Fetch company data ½ day
[ ] Match up above 2 datasets Uncertain, 3 days maybe more
[ ] Put in database 1 day
[ ] Write route for querying database ½ day
[ ] Write simple auth system for service 1 day
[ ] LX back-end request ½ day