AtlasOfLivingAustralia / la-pipelines

Living Atlas Pipelines extensions
3 stars 4 forks source link

Location Processor Gaps #285

Closed charvolant closed 3 years ago

charvolant commented 3 years ago

Functionality present in the biocache-store location processor that is either missing or sufficiently different to need a review.

Missing or questionable functionality

charvolant commented 3 years ago

Also, to be tested:

djtfmartin commented 3 years ago

Process stateProvince and country but only if not derived from lat/long.

This is done for stateProvince in the extensions to LocationInterpreter

nickdos commented 3 years ago

Do we have any grid reference data for ALA? I have a feeling the UK NBN require this code.

I don't think we should be parsing verbatim values, as they should only be provided as a visual check for non-verbatim versions (guess). They also tend to be pretty un-uniform which makes parsing a bit hit and miss anyway.

No decimal lat/lng could be derived from geospatial_kosher via -geospatial_kosher:*.

javier-molina commented 3 years ago

Reproject onto the supplied coordinatePrecision (rounded to 6 decimals as a default)

Here is a brief explanation on why 6 digits it's been used https://github.com/gbif/pipelines/issues/466

javier-molina commented 3 years ago

242 add more questions to this analysis

peggynewman commented 3 years ago

We ticked off some things: eastings/northings (around 40K in biocache with unknown decimalLat/Long), or verbatim coordinates grid reference system is the UK ordinate survey grid spatially-invalid issue in gbif should cover missing decLat/Long not processing any verbatim fields

Most of this functionality seems to come from a time when we were working hard to extract and standardise any location information that we could.

The coordinate precision flags and processing is a bit more important however. @M-Nicholls could you comment on the usefulness of the Missing Coordinate Precision flag, or the Too Precise flags?
Also - if the Missing Coodinate Precision flag plays a part of the spatially-invalid flag - maybe we should keep it.

In terms of processing based on coordinatePrecision - eg making sure that reprojections maintain the coordinate precision. This is important, because reprojections create lots of decimal places. See TDWG paper on Georeferencing Best Practice

peggynewman commented 3 years ago

image

M-Nicholls commented 3 years ago

Re Missing Coordinate Precision flag, and the Too Precise flags

They are useful bits of data in assessing whether to trust the data as provided - too many decimal points is an indicator that something's dodgy - usually a conversion (datum or coordinate system) or georeferencing artifact. Missing coordinate precision means you can't check if the coords meet the expected precision so you can't validate or improve the record.

These look like pretty straight forward checks, any reason not to include them?

charvolant commented 3 years ago

See https://github.com/gbif/pipelines/issues/517

charvolant commented 3 years ago

See #242 The GBIF LocationTransform/LocationInterpreter does country lookups but not stateProvince. The ALA LocationTransform does both.

javier-molina commented 3 years ago

From @djtfmartin on #322

I don't think we should (re)implement a flag for missing coordinate precision for 2 reasons:

GBIF arent doing this (and i think the reason why is (2) Of the data in ALA, only 390,314 records have this value. That mean we will flag 99.5% of records with this problem which doesn't seem to be a useful thing to do. I think this will skew (as it does currently) a lot of dashboard-style breakdowns of data quality issues for datasets. Sorry for not commenting on the other issue.

cc @M-Nicholls @javier-molina

javier-molina commented 3 years ago

Coordinate Precision is going away for the reason above. This has already been capture in end user documentation and the remaining task for it is #372