Indicia-Team / warehouse

GNU General Public License v3.0
5 stars 3 forks source link

Add a record against disallowed taxon blocks the scheduled tasks #485

Closed JimBacon closed 1 year ago

JimBacon commented 1 year ago

Similar to #484.

In this instance, occurrences 32038096 and 32038222 have not been added to cache_occurrences_functional. They are records made against taxa_taxon_list_id 528820 and 61035 respectively. Neither of these are present in cache_taxa_taxon_lists which causes the insert in to cache_occurrences_functional to fail. They are not in cache_taxa_taxon_lists because to be inserted there, the following join must exist

join taxa_taxon_lists ttlpref
      on ttlpref.taxon_meaning_id=ttl.taxon_meaning_id
      and ttlpref.preferred='t'
      and ttlpref.taxon_list_id=ttl.taxon_list_id
      and ttlpref.deleted=false
      and ttlpref.allow_data_entry=ttl.allow_data_entry

The join fails because ttl.allow_data_entry is false while ttlpref.allow_data_entry is true.

Both taxa were updated on 2022-03-24 15:35:48 which was probably when the UKSI update was performed. Both occurrences are for the iRecord Import survey. Perhaps the underlying issue is with the import method.

I can work around this problem by updating the taxa_taxon_list_ids of the occurrences as follows

@johnvanbreda, do you have any suggestions to resolve this permanently?

johnvanbreda commented 1 year ago

Thanks @JimBacon - looking at this I don't think the logic is quite as I intended. A redundant name (allow_data_entry=false) should be allowed to exist in cache_taxa_taxon_lists with a non-redundant preferred name. Note that there are a very small number of names in the database where there is a redundant preferred name AND a non-redundant preferred name, I think these cases are only legacy, but they still exist. My feeling now is that we should modify the query so that it selects the non-redundant preferred name and only fall-back to a redundant accepted name if a non-redundant accepted name is not available. The redundant accepted name is only likely to be used in the scenario that an entire taxon is redundant.

I'll submit a pull request for you to review. There may be additional work required to update the existing data as well.

johnvanbreda commented 1 year ago

I've done the following:

  1. Deployed the change (with review suggestions applied).
  2. Created work queue tasks for missing redundant taxon names.
  3. After these tasks completed, created work queue tasks for missing occurrences (approx 500).