BiologicalRecordsCentre / iRecord

Repository to store and track enhancements, issues and tasks regarding the iRecord website.
http://irecord.org.uk
2 stars 1 forks source link

UKSI incremental update - handling vernacular name additions #1700

Open burkmarr opened 4 months ago

burkmarr commented 4 months ago

This issue opened in response to a comment from @sacrevert on another issue: https://github.com/BiologicalRecordsCentre/iRecord/issues/1617#issuecomment-2196644419

We had some changes to the UKSI made in April, including adding some common names for NPMS aggregates where they were lacking. (For example Viola reichenbachiana/riviniana (UKSI Sandbox link) had a common name added as a synonym. However, this does not show up as a common name where it appears on one of the NPMS species lists: https://warehouse1.indicia.org.uk/index.php/taxa_taxon_list/edit/368996

On investigating this, I found that the add synonym operations were present in the UKSI snapshot and we had processed them during the incremental update of 12/06/2024.

I noticed that the new entry for these common names in the taxa table had their 'scientific' field set to 'true' and I initially thought that this was the problem. On further investigation I think that Indicia does not use this field to identify common names, instead it uses the 'language_id' field - identifying any records where the value is not '2' (indicating Latin) as common names.

The UKSI snapshot database table from which we extract data for the incremental updates includes a field - 'nameType' - to indicate whether an 'add synonym' operation is adding a scientific (value 'S') or vernacular (value 'V') name. The first problem is that our UKSI operations module ignores this field whereas it could use it to set the 'language_id' field in the taxa table. We could set the value of 'laguage_id' to '2' where the value of 'nameType' is 'S' and to '1' (indicating English) where it is 'V'. This is not a perfect solution because Indicia supports many other languages used for common names, e.g. Welsh, but it would work.

I manually update the 'taxa' table for the newly added 'Common Dog-violet / Early Dog-violet' record, setting the 'language_id' flag to 1 (indicating English). After the scheduled tasks had rebuilt the corresponding 'cache_taxa_taxon_lists' record, then the Warehouse displayed the common name when I examined 'Viola reichenbachiana/riviniana' in the master taxon list: https://warehouse1.indicia.org.uk/index.php/taxa_taxon_list/edit/84312

However it is still not displayed as a common name for the NPMS lists, e.g: https://warehouse1.indicia.org.uk/index.php/taxa_taxon_list/edit/368996.

@johnvanbreda - a few questions for you:

  1. Shall I look at updating the UKSI operations module to set the 'taxa.language_id' field based on the 'nameType' from the UKSI import?
  2. I think we will need to update the 'language_id' field to '1' for any new vernacular names added via UKSI incremental imports. Will we need also to update taxa.yml?
  3. To use the new common names for the NPMS lists, I guess we will need new entries in the 'taxa_taxon_lists' table corresponding to the relevant new vernacular names and the NPMS lists. Is that correct? If so, is there an established workflow for doing this?