AtlasOfLivingAustralia / la-pipelines

Living Atlas Pipelines extensions
3 stars 4 forks source link

UNKNOWN_COUNTRY_NAME assertion should be deprecated and wiki page marked as deprecated (or deleted) #506

Open RobinaSanderson opened 2 years ago

RobinaSanderson commented 2 years ago

Currently we have 2 assertions which flag the same thing:

  1. COUNTRY_INVALID
  2. UNKNOWN_COUNTRY_NAME

Unknown country name is not used and should be removed to reduce confusion. When the assertion is removed the corresponding wiki page should be marked as deprecated or deleted if nothing will link to it any more. Wiki page: https://github.com/AtlasOfLivingAustralia/ala-dataquality/wiki/UNKNOWN_COUNTRY_NAME

From Javier: Country invalid flags records where the country or country code cannot be matched to the (GBIF) vocabulary for country names : https://github.com/gbif/pipelines/blob/2576beabe6e48554f3031566dc89a84cbd13e2f8/sd[…]gbif/pipelines/core/parsers/location/parser/LocationParser.java Vocabulary is: https://github.com/gbif/parsers/blob/master/src/main/resources/dictionaries/parse/countryName.tsv

Unknown Country Name is defined in pipelines code but not actually used, I could not find any records using it: https://github.com/search?q=repo%3Agbif%2Fpipelines+UNKNOWN_COUNTRY_NAME+in%3Afile&type=Code

brucehyslop commented 2 years ago

UNKNOWN_COUNTRY_NAME can be easily removed from biocache-service by deleting ALAOccurrenceIssue.UNKNOWN_COUNTRY_NAME https://github.com/AtlasOfLivingAustralia/biocache-service/blob/develop/src/main/java/au/org/ala/biocache/dto/ALAOccurrenceIssue.java#L34

I can confirm there is not code reference that adds the UNKNOWN_COUNTRY_NAME and there are no occurrence records with this assertion defined.

Note: removing the assertion / ALAOccurrenceIssue will change the assertion codes which are used to link to the Data Quality Checks google docs sheet. These would need to be updated or #426 implemented.