Closed whitfarnum closed 1 year ago
hey @whitfarnum thanks for sharing your observations.
I was able to reproduce that the name alignment template is currently preventing from using catalogue of life due to a configuration issue, possibly due to a recent upgrade of nomer.
https://github.com/jhpoelen/higherwhit/actions/runs/6089923421/job/16523723178#step:5:7028
Thanks for your patience as I am working to re-configure the name alignment workflow.
Note that other taxonomic authorities were able to align and that catalogue of life is enabled for higher order taxon matching
See attached name alignment report retrieved via https://github.com/jhpoelen/higherwhit/actions/runs/6089923421/
@whitfarnum I can confirm that the preprocessed version of the catalog of life shipped with Nomer needs to be updated to be compatible with the extended support for Catalogue of Life in Nomer. Apologies for the confusion, this one slipped through . . . another example of the benefits of open communication!
Working on it now . . .
It appears that the Catalogue of Life is changing quite a bit, not just in content (it is being actively curated), but also the structure of the published data is changing. So, having specific versions of Catalogue of Life is a good thing, I think - when not using versioned copies, name alignment results may change on the time of day you'd run the alignment.
I've created Nomer v0.5.3 https://github.com/globalbioticinteractions/nomer/releases/tag/0.5.3 . This version allows for re-using "older" (months vs days old) pre-calculated indexed versions of Catalogue of Life. This should address your immediate need.
In addition, some changes are needed to accommodate newer catalogue of life schemas. I am hoping to address that sooner rather than later.
So, for now, Nomer support matching against a somewhat recent copy of Catalogue of Life. And, I am continuing to work towards supporting more up-to-date versions of Catalogue of Life.
Perhaps this is a good example of the cost of "big" catalogs - because the catalog has everything in it (hundreds, perhaps thousands of checklists), there's likely to be changes daily: there's always some curation happening somewhere. Contrast this with modular publication of "small" catalogs - here, the changes are isolated to the particular taxonomic area that happen to be active, so the changes are isolated to small chunks. Another approach to reduce the costs of "keeping up with the Jones's" is to only upgrade when requested. The latter is the approach currently taken by Nomer: the Nomer Corpus of Taxonomic Resources keeps versioned copies of taxonomic resources, and these upgraded on demand. (e.g., #134 ).
Also, note that using Nomer 0.5.3, the expected Catalogue of Life matches (incl. higher order) appeared in alignment report retrieved via https://github.com/jhpoelen/higherwhit/actions/runs/6089923421 . See also attached alignment review. alignment-report.zip
@whitfarnum apologies for the verbose descriptions. If anything, this is a note to self and way to document the root cause of your observation that the catalogue of life matches no longer appeared.
Please confirm that you can now see the Catalogue of Life alignment suggestions made by Nomer via the alignment workflow.
Onward!
To address the root cause, I've improved the packaging and integration of Catalogue of Life.
For details, see:
Poelen, Jorrit H. (ed.). (2023). Catalogue of Life Repackaged and Sorted hash://sha256/e7130fb557d9aee033ac7147f4d5c4c75f12223dd43e53c7cbb141372f9579cd hash://md5/882b8744b3ebd5fae371fa659ee52a2b (0.2) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.8329453
These changes are incorporated in Nomber v0.5.4 and is now available through the name alignment workflow. (e.g., https://github.com/globalbioticinteractions/name-alignment-template) and GloBI data reviews.
Please feel free to open a new issue or re-open this one if problems persist.
Thanks!
@jhpoelen it worked great. This tool has been amazing in my curation work. It has saved lots of time aligning and updating names. possibly months.
Glad to hear that it is working for you. Also, let me know if you ever do some kind of write up of your project(s) . . . curious to see where these tools end up being used . . .
I am doing large scale inventories and cleaning taxonomy. I want to be able to provide citation and sources for all my species names and higher taxonomy. Nomer finds and creates links to species name very well. I wanted to see if it would do the same for names at the level of Genus, Tribe, Subfamily, and Family. I creased a names.csv file that only contained names at Genus and higher. I got no results.
One entry in my list was Sisyphini which was not found by nomer but does exist in the catalog of life. https://www.catalogueoflife.org/data/taxon/L8K
Does the program not search the higher taxa for matches?
names higher taxa only.csv