Closed seltmann closed 3 years ago
@seltmann sounds good!
Can you include an example of an existing checklist dwc-a that you'd like to use as a guiding example?
@seltmann provided ITIS DwC-A as an example:
https://www.gbif.org/dataset/9ca92552-f23a-41a8-a140-01abaa31c931
Note however, that https://itis.gov/downloads/index.html does not offer a DwC-A bulk download.
Also, I noticed that GBIF's ITIS page points to https://hosted-datasets.gbif.org/datasets/itis.zip as their source (see https://www.gbif.org/dataset/9ca92552-f23a-41a8-a140-01abaa31c931#description)
See https://itis.gov/dwca_format.html for ITIS DwC-A usage information.
With recent changes, I was able to produce the following output:
$ nomer dump discoverlife-taxon | head
using matcher [discoverlife-taxon]
https://www.discoverlife.org/mp/20q?search=Acamptopoeum+argentinum Acamptopoeum argentinum HAS_ACCEPTED_NAME https://www.discoverlife.org/mp/20q?search=Acamptopoeum+argentinum Acamptopoeum argentinum species Animalia | Arthropoda | Insecta | Hymenoptera | Andrenidae | Acamptopoeum argentinum https://www.discoverlife.org/mp/20q?search=Animalia | https://www.discoverlife.org/mp/20q?search=Arthropoda | https://www.discoverlife.org/mp/20q?search=Insecta | https://www.discoverlife.org/mp/20q?search=Hymenoptera | https://www.discoverlife.org/mp/20q?search=Andrenidae | https://www.discoverlife.org/mp/20q?search=Acamptopoeum+argentinum kingdom | phylum | class | order | family | species https://www.discoverlife.org/mp/20q?search=Acamptopoeum+argentinum
https://www.discoverlife.org/mp/20q?search=Acamptopoeum+calchaqui Acamptopoeum calchaqui HAS_ACCEPTED_NAME https://www.discoverlife.org/mp/20q?search=Acamptopoeum+calchaqui Acamptopoeum calchaqui species Animalia | Arthropoda | Insecta | Hymenoptera | Andrenidae | Acamptopoeum calchaqui https://www.discoverlife.org/mp/20q?search=Animalia | https://www.discoverlife.org/mp/20q?search=Arthropoda | https://www.discoverlife.org/mp/20q?search=Insecta | https://www.discoverlife.org/mp/20q?search=Hymenoptera | https://www.discoverlife.org/mp/20q?search=Andrenidae | https://www.discoverlife.org/mp/20q?search=Acamptopoeum+calchaqui kingdom | phylum | class | order | family | species https://www.discoverlife.org/mp/20q?search=Acamptopoeum+calchaqui
https://www.discoverlife.org/mp/20q?search=Acamptopoeum+colombiense Acamptopoeum colombiense HAS_ACCEPTED_NAME https://www.discoverlife.org/mp/20q?search=Acamptopoeum+colombiense Acamptopoeum colombiense speciesAnimalia | Arthropoda | Insecta | Hymenoptera | Andrenidae | Acamptopoeum colombiense https://www.discoverlife.org/mp/20q?search=Animalia | https://www.discoverlife.org/mp/20q?search=Arthropoda | https://www.discoverlife.org/mp/20q?search=Insecta | https://www.discoverlife.org/mp/20q?search=Hymenoptera | https://www.discoverlife.org/mp/20q?search=Andrenidae | https://www.discoverlife.org/mp/20q?search=Acamptopoeum+colombiense kingdom | phylum | class | order | family | species https://www.discoverlife.org/mp/20q?search=Acamptopoeum+colombiense
https://www.discoverlife.org/mp/20q?search=Acamptopoeum+colombiensis_sic Acamptopoeum colombiensis_sic SYNONYM_OF https://www.discoverlife.org/mp/20q?search=Acamptopoeum+colombiense Acamptopoeum colombiense speciesAnimalia | Arthropoda | Insecta | Hymenoptera | Andrenidae | Acamptopoeum colombiense https://www.discoverlife.org/mp/20q?search=Animalia | https://www.discoverlife.org/mp/20q?search=Arthropoda | https://www.discoverlife.org/mp/20q?search=Insecta | https://www.discoverlife.org/mp/20q?search=Hymenoptera | https://www.discoverlife.org/mp/20q?search=Andrenidae | https://www.discoverlife.org/mp/20q?search=Acamptopoeum+colombiense kingdom | phylum | class | order | family | species https://www.discoverlife.org/mp/20q?search=Acamptopoeum+colombiense
https://www.discoverlife.org/mp/20q?search=Acamptopoeum+fernandezi Acamptopoeum fernandezi HAS_ACCEPTED_NAME https://www.discoverlife.org/mp/20q?search=Acamptopoeum+fernandezi Acamptopoeum fernandezi species Animalia | Arthropoda | Insecta | Hymenoptera | Andrenidae | Acamptopoeum fernandezi https://www.discoverlife.org/mp/20q?search=Animalia | https://www.discoverlife.org/mp/20q?search=Arthropoda | https://www.discoverlife.org/mp/20q?search=Insecta | https://www.discoverlife.org/mp/20q?search=Hymenoptera | https://www.discoverlife.org/mp/20q?search=Andrenidae | https://www.discoverlife.org/mp/20q?search=Acamptopoeum+fernandezi kingdom | phylum | class | order | family | species https://www.discoverlife.org/mp/20q?search=Acamptopoeum+fernandezi
https://www.discoverlife.org/mp/20q?search=Acamptopoeum+inauratum Acamptopoeum inauratum HAS_ACCEPTED_NAME https://www.discoverlife.org/mp/20q?search=Acamptopoeum+inauratum Acamptopoeum inauratum species Animalia | Arthropoda | Insecta | Hymenoptera | Andrenidae | Acamptopoeum inauratum https://www.discoverlife.org/mp/20q?search=Animalia | https://www.discoverlife.org/mp/20q?search=Arthropoda | https://www.discoverlife.org/mp/20q?search=Insecta | https://www.discoverlife.org/mp/20q?search=Hymenoptera | https://www.discoverlife.org/mp/20q?search=Andrenidae | https://www.discoverlife.org/mp/20q?search=Acamptopoeum+inauratum kingdom | phylum | class | order | family | species https://www.discoverlife.org/mp/20q?search=Acamptopoeum+inauratum
https://www.discoverlife.org/mp/20q?search=Acamptopoeum+melanogaster Acamptopoeum melanogaster HAS_ACCEPTED_NAME https://www.discoverlife.org/mp/20q?search=Acamptopoeum+melanogaster Acamptopoeum melanogaster speciesAnimalia | Arthropoda | Insecta | Hymenoptera | Andrenidae | Acamptopoeum melanogaster https://www.discoverlife.org/mp/20q?search=Animalia | https://www.discoverlife.org/mp/20q?search=Arthropoda | https://www.discoverlife.org/mp/20q?search=Insecta | https://www.discoverlife.org/mp/20q?search=Hymenoptera | https://www.discoverlife.org/mp/20q?search=Andrenidae | https://www.discoverlife.org/mp/20q?search=Acamptopoeum+melanogaster kingdom | phylum | class | order | family | species https://www.discoverlife.org/mp/20q?search=Acamptopoeum+melanogaster
https://www.discoverlife.org/mp/20q?search=Acamptopoeum+nigritarse Acamptopoeum nigritarse HAS_ACCEPTED_NAME https://www.discoverlife.org/mp/20q?search=Acamptopoeum+nigritarse Acamptopoeum nigritarse species Animalia | Arthropoda | Insecta | Hymenoptera | Andrenidae | Acamptopoeum nigritarse https://www.discoverlife.org/mp/20q?search=Animalia | https://www.discoverlife.org/mp/20q?search=Arthropoda | https://www.discoverlife.org/mp/20q?search=Insecta | https://www.discoverlife.org/mp/20q?search=Hymenoptera | https://www.discoverlife.org/mp/20q?search=Andrenidae | https://www.discoverlife.org/mp/20q?search=Acamptopoeum+nigritarse kingdom | phylum | class | order | family | species https://www.discoverlife.org/mp/20q?search=Acamptopoeum+nigritarse
https://www.discoverlife.org/mp/20q?search=Acamptopoeum+prinii Acamptopoeum prinii HAS_ACCEPTED_NAME https://www.discoverlife.org/mp/20q?search=Acamptopoeum+prinii Acamptopoeum prinii species Animalia | Arthropoda | Insecta | Hymenoptera | Andrenidae | Acamptopoeum prinii https://www.discoverlife.org/mp/20q?search=Animalia | https://www.discoverlife.org/mp/20q?search=Arthropoda | https://www.discoverlife.org/mp/20q?search=Insecta | https://www.discoverlife.org/mp/20q?search=Hymenoptera | https://www.discoverlife.org/mp/20q?search=Andrenidae | https://www.discoverlife.org/mp/20q?search=Acamptopoeum+prinii kingdom | phylum | class | order | family | species https://www.discoverlife.org/mp/20q?search=Acamptopoeum+prinii
https://www.discoverlife.org/mp/20q?search=Acamptopoeum+submetallicum Acamptopoeum submetallicum HAS_ACCEPTED_NAME https://www.discoverlife.org/mp/20q?search=Acamptopoeum+submetallicum Acamptopoeum submetallicum speciesAnimalia | Arthropoda | Insecta | Hymenoptera | Andrenidae | Acamptopoeum submetallicum https://www.discoverlife.org/mp/20q?search=Animalia | https://www.discoverlife.org/mp/20q?search=Arthropoda | https://www.discoverlife.org/mp/20q?search=Insecta | https://www.discoverlife.org/mp/20q?search=Hymenoptera | https://www.discoverlife.org/mp/20q?search=Andrenidae | https://www.discoverlife.org/mp/20q?search=Acamptopoeum+submetallicum kingdom | phylum | class | order | family | species https://www.discoverlife.org/mp/20q?search=Acamptopoeum+submetallicum
Also, with recent changes, ITIS offline matcher supports dump
$ nomer dump itis-taxon-id | grep Apidae | wc -l
using matcher [itis-taxon-id]
ITIS taxonomy already indexed at [xxxx/nomer/itis/itis], no need to import.
6645
6645 names related to Apidae
with
$ nomer dump itis-taxon-id | grep Apidae | grep -o -P "(SYNONYM_OF|HAS_ACCEPTED_NAME)" | sort | uniq -c
using matcher [itis-taxon-id]
ITIS taxonomy already indexed at [/media/jorrit/data/nomer/itis/itis], no need to import.
6105 HAS_ACCEPTED_NAME
540 SYNONYM_OF
6105 accepted names and 540 synonyms.
Note that this is based on:
Integrated Taxonomic Information System. (2020). Repackaged Full ITIS Data Set (MS SQL Server) (itisMS.043020) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.3833105
which is based on an ITIS data export provided in 2020. Updates can be made by adjusting itis nomer properties
manually and/or updating defaults.
$ nomer properties | grep itis
nomer.itis.synonym_links=gz:https://zenodo.org/record/3833105/files/synonym_links.gz!/synonym_links
nomer.itis.taxon_unit_types=gz:https://zenodo.org/record/3833105/files/taxon_unit_types.gz!/taxon_unit_types
nomer.itis.taxonomic_units=gz:https://zenodo.org/record/3833105/files/taxonomic_units.gz!/taxonomic_units
In total for ITIS -
$ nomer dump itis-taxon-id | grep -o -P "(SYNONYM_OF|HAS_ACCEPTED_NAME)" | sort | uniq -c
using matcher [itis-taxon-id]
ITIS taxonomy already indexed at [xxxx/data/nomer/itis/itis], no need to import.
600434 HAS_ACCEPTED_NAME
234551 SYNONYM_OF
~600k names and 234k synonyms
with performance currently at:
$ nomer dump itis-taxon-id | pv -l > /dev/null
using matcher [itis-taxon-id]
ITIS taxonomy already indexed at [/xxxx/nomer/itis/itis], no need to import.
834k 0:00:32 [25.6k/s]
exporting ~834k names in about 30 seconds.
Implemented in https://github.com/globalbioticinteractions/nomer/releases/tag/0.2.4 .
@seltmann if you get the chance, please reproduce (note is takes about 30s or more):
$ nomer dump itis > itis.tsv
...
$ cat itis.tsv | head -n2
ITIS:50 Bacteria HAS_ACCEPTED_NAME ITIS:50 Bacteria kingdomBacteria ITIS:50 kingdom http://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=50
ITIS:51 Schizomycetes SYNONYM_OF ITIS:50 Bacteria kingdom Bacteria ITIS:50 kingdom http://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=50
$ cat itis.tsv | sha256sum
d9ea9fd1d44aeedc86643527b51055b2ae220674aa27127b7fbe2f7d07442332 -
$ ls -lha itis.tsv
xxxx 579M xxx itis.tsv
Similarly,
please reproduce,
$ nomer dump discoverlife > discoverlife.tsv
...
$ cat discoverlife.tsv | head -n2
https://www.discoverlife.org/mp/20q?search=Acamptopoeum+argentinum Acamptopoeum argentinum HAS_ACCEPTED_NAME https://www.discoverlife.org/mp/20q?search=Acamptopoeum+argentinum Acamptopoeum argentinum species Animalia | Arthropoda | Insecta | Hymenoptera | Andrenidae | Acamptopoeum argentinum https://www.discoverlife.org/mp/20q?search=Animalia | https://www.discoverlife.org/mp/20q?search=Arthropoda | https://www.discoverlife.org/mp/20q?search=Insecta | https://www.discoverlife.org/mp/20q?search=Hymenoptera | https://www.discoverlife.org/mp/20q?search=Andrenidae | https://www.discoverlife.org/mp/20q?search=Acamptopoeum+argentinum kingdom | phylum | class | order | family | species https://www.discoverlife.org/mp/20q?search=Acamptopoeum+argentinum
https://www.discoverlife.org/mp/20q?search=Acamptopoeum+calchaqui Acamptopoeum calchaqui HAS_ACCEPTED_NAME https://www.discoverlife.org/mp/20q?search=Acamptopoeum+calchaqui Acamptopoeum calchaqui species Animalia | Arthropoda | Insecta | Hymenoptera | Andrenidae | Acamptopoeum calchaqui https://www.discoverlife.org/mp/20q?search=Animalia | https://www.discoverlife.org/mp/20q?search=Arthropoda | https://www.discoverlife.org/mp/20q?search=Insecta | https://www.discoverlife.org/mp/20q?search=Hymenoptera | https://www.discoverlife.org/mp/20q?search=Andrenidae | https://www.discoverlife.org/mp/20q?search=Acamptopoeum+calchaqui kingdom | phylum | class | order | family | species https://www.discoverlife.org/mp/20q?search=Acamptopoeum+calchaqui
$ cat discoverlife.tsv | sha256sum
c415109c04449a36ff398602b7afa623540dab8b1a6c628e020907386463b900 -
$ ls -lha discoverlife.tsv
xxxx 36M xxxx discoverlife.tsv
See attached itis.tsv and discoverlife.tsv discoverlife.tsv.gz itis.tsv.gz
@jhpoelen today in the TDWG taxonomic backbone discussion I asked the question:
" I have my own taxon names list, how best (easiest) for me to include my name list into globalnames services?"
This was answered by Dima to say "send me an email and I will include in globalnames"
I think this is a solution for including our bee names into globalnames.
In the same meeting, Joe Miller suggests to "please submit it as a checklist to GBIF, it will get a DOI and you can compare it to COL and GBIF"
Leaving out the DOI (that is a whole separate discussion),
Having a data driven approach can be very neat - no matter where they are stored (GBIF, @dimus 's own internal globalnames infrastructure, Zenodo, Internet Archive) .
This comes back to the separation of datasets (versions), and tracking their use.
fyi @mjy
@seltmann we've been using the "dump" or "list" features for a little while now.
Closing this issue, please report bugs in this functionality is specific, newly created issues.
As a function of nomer create a dwc-a export using user defined name lists. For example, create a checklist of bees that would include ITIS and DiscoverLife bee names, including synonymns.