BiodiversiteQuebec / bdqc_taxa

`BIOQC-taxa` is a python package that interface with *Biodiversité Québec*'s database to query reference taxa sources, parse their return and generate records.
0 stars 0 forks source link


bdqc_taxa is a python package that interface with Biodiversité Québec's database to query reference taxa sources, parse their return and generate records.


For installation in postgres server

Installation must be performed as postgres user :

sudo su postgres
pip install ...

For a new installation

pip install git+

For an upgrade

pip install --upgrade git+


Match a scientific name against all sources

The taxa_ref module is used to query the reference taxa sources and parse their return using fuzzy matching.

For a specific match, the module provides functions to return scientific name, taxonomic hierarchy, vernacular name and source of the match. For each source, valid scientific names are returned as well as synonyms when corresponding to the matched input name. Parent taxa are returned as well as children taxa when corresponding to the matched input name.


Query for all sources using a scientific name can be done with the following method. This should fits most use cases.

from bdqc_taxa.taxa_ref import TaxaRef

results = TaxaRef.from_all_sources('Canis lupus')

Complex names

When the taxon related to an observation is complex, such as multiple organism are identified for the same observation(Species 1 | Species 2 | Species 3), a single observed taxonomic entry is injected as such. References will be obtained for each single organism listed by the complex and all related parents. References matched from complex observed taxons are identified as such and can then be included or discarded from queries performed by the user. Common parent taxon are identified as such and can be used to query complex observed taxons.


Certain scientific names corresponds to different organism within two entirely different branches of the tree of life. For example, the scientific name Salix corresponds to the genus of willows in the plant kingdom and to a genus of tunicates in the animal kingdom. To avoid matching for such case, the user can specify a parent taxa name to restrict the results to the branch containing the parent taxa. For example, the user can specify the parent taxa name Plantae to restrict the results to the plant kingdom.

Important All sources might be match at least using kingdom or phylum level parent taxa name. However, only some sources make available the whole taxa hierarchy. Filtering results using parent_taxa with other ranks might not return any results for certain sources (e.g. Bryoquel, VASCAN, CDPNQ).

We thus HIGHLY recommend to use parent_taxa with kingdom or phylum level parent taxa name.

The parent_taxa argument is optional. If not specified, the module will return all results for the given scientific name.

from bdqc_taxa.taxa_ref import TaxaRef

results = TaxaRef.from_all_sources('Salix', parent_taxa='Plantae')

Find vernacular names for a scientific name

The taxa_vernacular module is used to query the reference taxa sources and parse their return using fuzzy matching in english and french.

For a specific match, the module provides functions to return the accepted vernacular names in english and french. The rank order of the sources is used to determine the accepted vernacular name.


Query for all sources using a scientific name can be done with the following method. This should fits most use cases.

from bdqc_taxa.vernacular import Vernacular

results = Vernacular.from_match('Canis lupus')


For certain sources, such as CDPNQ, the vernacular name will be returned for accepted synonyms. If observed scientific name differs, the user should do multiple queries for each known synonyms.



Wrapper functions to query the sources using either api or the sqlite database are individually implemented in modules gbif, global_names, bryoquel, cdpnq, eliso and wikidata.

Custom sources

These tables containts the custom sources used by the taxa_ref module. They are implemented in the custom_sources sqlite database. The database is located in the bdqc_taxa package directory. Only exact matches are returned for the custom sources.

TABLE bryoquel


This file was generated on 2022-09-21 from the Bryoquel taxonomy file.
The file was downloaded from on 2022-09-21.
The last version of the bryoquel xlsx file is from 2022-09-12`.
The file was parsed using the script `scripts/`.
The file was parsed using the script parse_bryoquel.ipynb.


The file contains a pandas dataframe with the following columns:
id: the Bryoquel IDtaxon
scientific_name: Noms latins acceptés du taxon, sans auteur
taxon_rank: Taxon rank
genus: Taxon genus
family: Taxon family
clade: Taxon clade
canonical_full: Noms latins acceptés du taxon, avec auteur
authorship: Auteur obtenu de Noms latins acceptés
vernacular_name_fr: Noms français acceptés
vernacular_name_en: Noms anglais acceptés

TABLE cdpnq_odonates


This file was generated from the CDPNQ odonates data file.
The file was obtained from on May 24, 2022
The last version of the bryoquel xlsx file is from 2022-09-12`.
The file was parsed using the script `scripts/`.


name: scientific name
valid_name: valid scientific name
rank: rank of the taxa
synonym: boolean indicating if the name is a synonym
author: author of the scientific name
canonical_full: canonical full name
vernacular_fr: vernacular name in French
vernacular_fr2: vernacular name in French from Natureserve

TABLE cdpnq_vertebrates


This file was generated from the Liste de la faune vert�br�e du Qu�bec (LFVQ) Data file LFVQ_18_04_2024.xlsx 
The file was obtained from Donn�es Qu�bec on 2023-01-12.
The last version of thefile is from 2024_04_18`.
The file was parsed using the script `scripts/`.


name: scientific name
valid_name: valid scientific name
rank: rank of the taxa
synonym: boolean indicating if the name is a synonym
author: author of the scientific name
canonical_full: canonical full name
vernacular_fr: vernacular name in French
vernacular_en: vernacular name in English

Notes: The entries have no recorded author.

TABLE eliso_invertebrates


This file was generated on 2024-04-23 from Eliso's Répertoire des noms d’invertébrés du Québec (2022) file.
The file was downloaded from on 2024-04-23.
The last version of the bryoquel xlsx file is from 2022-11-18.
The file was parsed using the script `scripts/`.


The file contains a pandas dataframe with the following columns:
taxa_name: Scientific name of the taxon
vernacular_fr: French vernacular name of the taxon
taxa_rank: Taxon rank
Embranchement: Phylum
Classe: Class
Ordre: Order
Famille: Family
Genre: Genus
Espèce: Species\n

Notes: The entries have no recorded author. The entries may contain comments in parentheses that are kept as is but may prevent matching.