`BIOQC-taxa` is a python package that interface with *Biodiversité Québec*'s database to query reference taxa sources, parse their return and generate records.
In the API call to gnr, it returns "duplicated" results because it also matches with synonyms for example Bubo scandiacus and Bubo scandiaca if we call for Bubo scandiacus.
The branch with the synonym is however already "corrected" by gnr so it returns two identical branch except that the match_type for the branch of the synonym is None instead of exact.
This is a problem because the match_type exact is used down the line in the taxonomic pipeline to only display vernacular names that are match_type exact (not yet implemented in Atlas, but it is in Coleo. Should fix the problem mentioned in https://github.com/ReseauBiodiversiteQuebec/atlas-db/issues/122) and:
If there is no parent_taxa provided, it then tries to inject everything to the table taxa_ref but there is a unique constraint on source/source_record_id (srid) and we can randomly inject the entry with match_type None instead of exact.
If there is a parent_taxa provided, the function prune_parent_taxa removes duplicates of the same srid for a source but does so by selecting the first appearance, which might not be the one with match_type = exact and might be the one with None.
TODO:
[x] Implement code to prioritize the match_type = exact if there is no parent_taxa (basically removing the duplicate entry before injecting to database, so we decide which one is injected match_type == exact
[x] If there is a parent_taxa, then implement code at the end of the function prune_parent_taxa to prioritize the entries with match_type exact
In the API call to gnr, it returns "duplicated" results because it also matches with synonyms for example
Bubo scandiacus
andBubo scandiaca
if we call forBubo scandiacus
. The branch with the synonym is however already "corrected" by gnr so it returns two identical branch except that the match_type for the branch of the synonym isNone
instead ofexact
.This is a problem because the match_type
exact
is used down the line in the taxonomic pipeline to only display vernacular names that are match_typeexact
(not yet implemented in Atlas, but it is in Coleo. Should fix the problem mentioned in https://github.com/ReseauBiodiversiteQuebec/atlas-db/issues/122) and:None
instead ofexact
.exact
and might be the one with None.TODO:
exact
if there is no parent_taxa (basically removing the duplicate entry before injecting to database, so we decide which one is injectedmatch_type == exact
exact