Open Ben-Hodgkiss opened 6 days ago
Jira Link: http://dluhcdigital.atlassian.net/browse/DATA-853
From OE (Trello, 25/09/24): Will need to revive expectations work. Worth checking out Entity organisation by eveleighoj · Pull Request #21 · digital-land/conservation-area-collection. We should take this test from a csv and output one result. Rmove the current expectation_issue stuff
Overview
Data management need a new data quality assessment implemented to detect when geometries supplied by LPAs are beyond the expected LPA boundary.
We think (after discussion with @eveleighoj and @psd ) that this should be run at the dataset level, rather than resource, by picking up the expectations work because we don’t want to remove geometry facts identified as having issues.
This should apply to:
geometry
fields of all ODP datasetstree
dataset, where it should apply to thepoint
field wheregeometry
isn’t available.Jupyter notebook with demo code: https://github.com/digital-land/jupyter-analysis/blob/main/analysis/2024-08_geo_issues_demo/geo_issues_demo-bounds.ipynb
Tech Approach Suggestions: ”Can we run the expectations on the dataset.csv or sqlite file?” We already have code by@chrisjohns51 at dataset level on sqlite file which has expectations running for retired entities. Note: There is some work done by@carloscoelho87 for LPA boundry check on brownfield-land.
Instead of saving the rowids in expectation issue/results, can we use entity as it are easily available in dataset csv files.
.
May need an initial call with \@:63eba8f93f5f32273d83eb78 to discuss approach so far and how it can be productionised.
Acceptance Criteria/Tests