BiologicalRecordsCentre / record-cleaner-service

Service for checking species records against the record cleaner rules
MIT License
0 stars 0 forks source link

Obtain the UK Species Inventory #3

Open JimBacon opened 9 months ago

JimBacon commented 9 months ago

The service will receive records of species identified by

These alternatives must be resolved to an identifier which specifies the rules to apply.

The UK Species Inventory (https://www.nhm.ac.uk/our-science/data/uk-species/index) provides the list of recognised names and keys but does not make it available as an online service. However, a copy of it can be obtained from the Indicia warehouse.

A local cache will be needed to maximise performance.

As the UKSI changes, the Indicia copy must be updated so that new names can be accepted and connected to the relevant rules.

JimBacon commented 3 months ago

The Indicia web service at services/rest/taxa/search only returns search results for TVK's of accepted taxa and the response does not include the TVK for non-accepted names. In other words, the search and response work with the external key field and ignore the search_code

The upshot of this is that the service cannot be used to validate records identified by TVKs of non-accepted names. Furthermore, records cannot be matched to rules where the rule is defined against a non-accepted TVK.

JimBacon commented 1 month ago

Warehouse fixed in https://github.com/Indicia-Team/warehouse/commit/058c9de61bfac6f4748551ee6913732b672ee429 so that TVKs of non-accepted names can be searched for.

JimBacon commented 1 month ago

As of v1.0.0, the local cache is built from querying the Indicia warehouse but records are not expired so they do not get updated when changes occur on the warehouse. This may not be necessary when we start using organism keys rather than preferred TVKs to join taxa to rules.