Analysis of ALA Geospatial Kosher algorithm

javier-molina commented 4 years ago

Biocache Store produces geospatial_kosher flag used for filtering.

This analysis task is to determine what the best place to implement the algorithm in the LA pipelines.

While conceptually one would be tempted to implement it as part of the Location transform, the algorithm requires additional inputs produced by other transform in the pipeline processing such as habitat and species list information.

One suggestion is to add the algorithm to the SOLR pipeline while conceptually separate it would be probably the solution with the best performance.

Conceptually or functionality wise even if the geospatial kosher algorithm produces only one flag it could well be implemented on its own pipeline.

If the actual analysis is not complex or it makes more sense to tackle in parallel with the implementation then there will be no need to create a separate implementation task.

See

djtfmartin commented 3 years ago

GBIF's equivalent of geospatialKosher field in biocache is the hasGeospatialIssue flag.

The code that sets the hasGeospatialIssue field is here: https://github.com/gbif/pipelines/blob/dev/sdks/core/src/main/java/org/gbif/pipelines/core/interpreters/core/LocationInterpreter.java#L78

and it's using the list here: https://github.com/gbif/pipelines/blob/dev/sdks/core/src/main/java/org/gbif/pipelines/core/interpreters/core/LocationInterpreter.java#L67

The relevant code on the biocache side is here: https://github.com/AtlasOfLivingAustralia/biocache-store/blob/develop/src/main/scala/au/org/ala/biocache/vocab/AssertionCodes.scala#L159

Below is a comparison of what is used to calculate these fields.

hasGeospatialIssue - Pipelines issue	geospatialKosher - Biocache assertion
ZERO_COORDINATE	ZERO_COORDINATES
COORDINATE_INVALID	equivalent to DECIMAL_LAT_LONG_CONVERSION_FAILED and DECIMAL_LAT_LONG_CALCULATION_FROM_VERBATIM_FAILED combined
COORDINATE_OUT_OF_RANGE	COORDINATES_OUT_OF_RANGE
COUNTRY_COORDINATE_MISMATCH	We used to do this, but we changed here
No equivalent in GBIF as user annotations note currently supported	TAXONOMIC_ISSUE
No equivalent in GBIF as user annotations note currently supported	GEOSPATIAL_ISSUE

So the only real difference to hasGeospatialIssue and geospatialKosher is the use of user assertions.

With this in mind, and due to the low usage of user assertions in ALA, I suggest we just adopt hasGeospatialIssue and keep an alignment with GBIF's processing.

djtfmartin commented 3 years ago

Awaiting sign off from project reference group.

javier-molina commented 3 years ago

Geospatial Kosher flag does not need to be implemented as per Reference Group Decision

AtlasOfLivingAustralia / la-pipelines

Analysis of ALA Geospatial Kosher algorithm #99