Closed javier-molina closed 3 years ago
GBIF's equivalent of geospatialKosher
field in biocache is the hasGeospatialIssue
flag.
The code that sets the hasGeospatialIssue
field is here:
https://github.com/gbif/pipelines/blob/dev/sdks/core/src/main/java/org/gbif/pipelines/core/interpreters/core/LocationInterpreter.java#L78
and it's using the list here: https://github.com/gbif/pipelines/blob/dev/sdks/core/src/main/java/org/gbif/pipelines/core/interpreters/core/LocationInterpreter.java#L67
The relevant code on the biocache side is here: https://github.com/AtlasOfLivingAustralia/biocache-store/blob/develop/src/main/scala/au/org/ala/biocache/vocab/AssertionCodes.scala#L159
Below is a comparison of what is used to calculate these fields.
hasGeospatialIssue - Pipelines issue | geospatialKosher - Biocache assertion |
---|---|
ZERO_COORDINATE | ZERO_COORDINATES |
COORDINATE_INVALID | equivalent to DECIMAL_LAT_LONG_CONVERSION_FAILED and DECIMAL_LAT_LONG_CALCULATION_FROM_VERBATIM_FAILED combined |
COORDINATE_OUT_OF_RANGE | COORDINATES_OUT_OF_RANGE |
COUNTRY_COORDINATE_MISMATCH | We used to do this, but we changed here |
No equivalent in GBIF as user annotations note currently supported | TAXONOMIC_ISSUE |
No equivalent in GBIF as user annotations note currently supported | GEOSPATIAL_ISSUE |
So the only real difference to hasGeospatialIssue
and geospatialKosher
is the use of user assertions.
With this in mind, and due to the low usage of user assertions in ALA, I suggest we just adopt hasGeospatialIssue
and keep an alignment with GBIF's processing.
Awaiting sign off from project reference group.
Geospatial Kosher flag does not need to be implemented as per Reference Group Decision
Biocache Store produces geospatial_kosher flag used for filtering.
This analysis task is to determine what the best place to implement the algorithm in the LA pipelines.
While conceptually one would be tempted to implement it as part of the Location transform, the algorithm requires additional inputs produced by other transform in the pipeline processing such as habitat and species list information.
One suggestion is to add the algorithm to the SOLR pipeline while conceptually separate it would be probably the solution with the best performance.
Conceptually or functionality wise even if the geospatial kosher algorithm produces only one flag it could well be implemented on its own pipeline.
If the actual analysis is not complex or it makes more sense to tackle in parallel with the implementation then there will be no need to create a separate implementation task.
See