Closed jtmiller28 closed 6 months ago
@jtmiller28 thanks for sharing this specific example.
I was able to independently reproduce:
$ echo -e "\tAncylandrena atoposoma" | nomer append discoverlife
Ancylandrena atoposoma NONE Ancylandrena atoposoma
also, I was able to find the name on the https://www.discoverlife.org website as you mentioned. See screenshot below.
And, I much like your idea to re-use Chesshire et al 2023 to help complement the existing resources.
Am also curious to hear from @seltmann on the topic.
Next step for me is to figure out why Ancylandrena atoposoma is not picked up by Nomer.
@jtmiller28 would you happen to have the full list of active DiscoverLife names that appear to not match via Nomer's support for DiscoverLife taxa?
I thought I did initially, however after closer inspection I noticed that these were more nuanced with some being corrections made by those experts for new designations that are not as of yet reflected in the DL database. Was hoping it was just an update issue, but I'll pull the initial names from chesshire and run Nomer through it and see where that leads. Hopefully more soon
@jtmiller28 took at look at your unexpected mismatches of nomer against discoverlife.
Turned out that the reason is that our DiscoverLife friends have upgraded their lists in 2022, and Nomer was still using the older DiscoverLife copy. Also see #80 .
After upgrading to the "new" discoverlife, I was able to produced the results below
echo -e "\tAncylandrena atoposoma"\
| nomer append discoverlife
yielded:
[main] INFO org.globalbioticinteractions.nomer.match.DiscoverLifeTaxonService - DiscoverLife name indexing started...
[main] INFO org.globalbioticinteractions.nomer.match.DiscoverLifeTaxonService - [51348] DiscoverLife names were indexed in 14s (@ 3667 names/s)
Ancylandrena atoposoma HAS_ACCEPTED_NAME https://www.discoverlife.org/mp/20q?search=Ancylandrena+atoposoma Ancylandrena atoposoma (Cockerell, 1934) species Animalia | Arthropoda | Insecta | Hymenoptera | Andrenidae | Ancylandrena atoposoma https://www.discoverlife.org/mp/20q?search=Animalia | https://www.discoverlife.org/mp/20q?search=Arthropoda | https://www.discoverlife.org/mp/20q?search=Insecta | https://www.discoverlife.org/mp/20q?search=Hymenoptera | https://www.discoverlife.org/mp/20q?search=Andrenidae | https://www.discoverlife.org/mp/20q?search=Ancylandrena+atoposoma kingdom | phylum | class | order | family | species https://www.discoverlife.org/mp/20q?search=Ancylandrena+atoposoma
Gotcha, that makes sense! Is there a possibility in the future for automation in updating DL with each edition?
I have some code to run that'll build the known list of names that are in DL for US bees according to that paper, so once Nomer is fully updated I can run it against it and check if there are any differences.
yep, working on it, see #80
@jtmiller28 please verify that Nomer v0.4.10 is matching the names as expected.
Thanks again for pointing this out - these are the kinds of things that make Nomer (and other open source software) tick.
@jtmiller28 did you get a chance to confirm that the newer discoverlife is now used in the recent version of Nomer and associated name alignment template tool.
@jhpoelen Apologies for the delayed response, I was getting rather enigmatic outputs so I assumed some error was occurring on my end. I still get them now however so Ill share what I found.
Starting from the beginning: Chesshire has an outlined csv that denotes verbatim names (after parsing), 2 decision steps (one through automated name alignment, 2nd reviewed by taxonomist John Ascher), and 1 field containing notes on alignment decisions. This data can be found here: chesshires-namelist.csv
To test whether names that are noted as present in Discover Life, I greped out anything with "DiscoverLife" or "DL" within the field that indicates alignment decisions. obtaining the following file (though removed one instance that suggested white space caused failure for alignment). I then ran these verbatimNames through nomer obtaining the following: nomer-test-output.txt
This is where things get a bit tricky, there are still 385 names that fail to align. The following strings constitute for source of name change: "accepted synonym when entered into DL website, still give to John" "collapsed subspecies/switched based on DL - still give to John" "DL list indicates that this is the accepted synonym" "Accepted Name in the DL list" "valid on ITIS and DL, Keep but do run by John" "On DL website and ITIS, Keep but do run by John" "DL list indicates that this is the accepted synonym, pass by John"
Problem is I can't replicate their resolution across the name list by using DL. There are some instances where the name is definitely on discoverlife and unread by Nomer, some where the name pulls you to the genus by using the site (& is not on Nomer), Dead links on DL, and others where the reason for mapping is completely off from what they suggest.
First Case: Name is on DL and not currently seen by Nomer.
echo -e "\tPseudopanurgus parvus" | nomer append discoverlife
yields
Pseudopanurgus parvus NONE Pseudopanurgus parvus
Reason for name alignment given by Chesshire: "Accepted Name in the DL list" https://www.discoverlife.org/ shows it is present on DL through manual search options
Second Case: with mismatch mapping through the website search tool, but a correct synonym when final name was searched
echo -e "\tLasioglossum nymphaerum" | nomer append discoverlife
yields
Lasioglossum nymphaerum NONE Lasioglossum nymphaerum
Reason attached to this particular name for alignment is noted as "accepted synonym when entered into DL website, still give to John" Searching https://www.discoverlife.org/ with Lasioglossum nymphaerum yields just the Lasioglossum genus. Backtracking from their final decision made name we can see however that Lasioglossum nymphaerum is synonymous with Lasioglossum oceanicum. A similar scenario to this seems present for Andrena californica, though note the one that is actually searchable on DL is Andrena californica wickhami.
Third case: Mapping fails, dead linkage on DL site
echo -e "\tHeterosarus bakeri" | nomer append discoverlife
yields
Heterosarus bakeri NONE Heterosarus bakeri
Reason noted for alignment: "DL list indicates that this is the accepted synonym" Searching name through DL yields a dead linkage? Authors suggested name: Pseudopanurgus bakeri which does not have present synonyms...
Fourth Case: Presumed erroneous reason field for alignment in their name table
ex.
echo -e "\tHeterosarus helianthi" | nomer append discoverlife
yields
Heterosarus helianthi NONE Heterosarus helianthi
Reason for alignment: "DL list indicates that this is the accepted synonym, pass by John" When discover life is searched for this name you arrive at a moth in Lepitdoptera: Hellinsia helianthi (Walsingham, 1880) The name they suggested is Pseudopanurgus helianthi , perserving the specificEpithet but opting for a bee genus. See Ashmeadiella washingtonensis for another case of this where specificEpithet is preserved but the genus is dropped for unexplained reasons. Seems to be a purposeful decision, but that seems odd they didn't correctly note that decision. This probably is not something to fix on Nomer end as there shouldnt be months/fungi mapping to bees by default, but I figured I'd make it apparent whats happening with some names.
To assure correct version
nomer version
yields
0.4.10
Apologies for the rather lengthy response, but I was a bit perplexed while going through it all...
Thanks! JT
@jtmiller28 thanks for your specific examples and for being patient with me. Hoping to have a look sooner rather than later.
Thanks jorrit for your constant attention to these issues!
I was able to reproduce your four examples of "NONE" matches via
https://github.com/jhpoelen/chesshires/actions/runs/4395176631
with abbreviated alignment report including:
providedName | alignRelation | alignedCatalogName | alignedExternalId | alignedName | alignedAuthority |
---|---|---|---|---|---|
Heterosarus bakeri | NONE | discoverlife | Heterosarus bakeri | ||
Heterosarus bakeri | SYNONYM_OF | itis | http://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=753406 | Pseudopanurgus bakeri | (Cockerell, 1896) |
Heterosarus bakeri | SYNONYM_OF | ncbi | https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=625948 | Pseudopanurgus bakeri | |
Heterosarus helianthi | NONE | discoverlife | Heterosarus helianthi | ||
Heterosarus helianthi | SYNONYM_OF | itis | http://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=753448 | Pseudopanurgus helianthi | Mitchell, 1960 |
Heterosarus helianthi | NONE | ncbi | Heterosarus helianthi | ||
Lasioglossum nymphaerum | NONE | discoverlife | Lasioglossum nymphaerum | ||
Lasioglossum nymphaerum | NONE | itis | Lasioglossum nymphaerum | ||
Lasioglossum nymphaerum | NONE | ncbi | Lasioglossum nymphaerum | ||
Pseudopanurgus parvus | NONE | discoverlife | Pseudopanurgus parvus | ||
Pseudopanurgus parvus | HAS_ACCEPTED_NAME | itis | http://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=753479 | Pseudopanurgus parvus | (Robertson, 1892) |
Pseudopanurgus parvus | NONE | ncbi | Pseudopanurgus parvus |
In trying to verify your claim:
First Case: Name is on DL and not currently seen by Nomer. echo -e "\tPseudopanurgus parvus" | nomer append discoverlife yields Pseudopanurgus parvus NONE Pseudopanurgus parvus
Reason for name alignment given by Chesshire: "Accepted Name in the DL list" https://www.discoverlife.org/ shows it is present on DL through manual search options
I think I found the associated species page at https://www.discoverlife.org/mp/20q?search=Protandrena+parva (see screenshot below)
Your detailed info was helpful to narrow down the suspicious (non) matches. If you can, please include evidence from DiscoverLife (e.g., link + screenshot), that would save me some time, assuming that you already had found the DL Url.
Am hoping to work through your examples and attempt fix them one by one, and see whether there's a pattern. Thanks for being patient.
@jtmiller28 thanks to your detailed notes, I was able to find the root cause:
Nomer took the names as is from DiscoverLife, so it expected provided names to include the subgenus whenever DiscoverLife used it. Your example shows that omitting the subgenus happens, and should be accounted for.
In other words,
echo -e "\tPseudopanurgus parvus" | nomer append discoverlife
and
echo -e "\tPseudopanurgus (Heterosarus) parvus " | nomer append discoverlife
should both appear as synonyms of
Protandrena parva (Robertson, 1892)
After updating nomer to include matches excluding the subgenus, I was able to generate the following results:
echo -e "\tPseudopanurgus parvus"\
| nomer append --include-header discoverlife\
| mlr --itsv --omd cat
providedExternalId | providedName | relationName | resolvedExternalId | resolvedName | resolvedAuthorship | resolvedRank | resolvedCommonNames | resolvedPath | resolvedPathIds | resolvedPathNames | resolvedPathAuthorships | resolvedExternalUrl |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Pseudopanurgus parvus | SYNONYM_OF | https://www.discoverlife.org/mp/20q?search=Protandrena+parva | Protandrena parva | (Robertson, 1892) | species | Animalia | Arthropoda | Insecta | Hymenoptera | Andrenidae | Protandrena parva | https://www.discoverlife.org/mp/20q?search=Animalia | https://www.discoverlife.org/mp/20q?search=Arthropoda | https://www.discoverlife.org/mp/20q?search=Insecta | https://www.discoverlife.org/mp/20q?search=Hymenoptera | https://www.discoverlife.org/mp/20q?search=Andrenidae | https://www.discoverlife.org/mp/20q?search=Protandrena+parva | kingdom | phylum | class | order | family | species | https://www.discoverlife.org/mp/20q?search=Protandrena+parva |
and
echo -e "\tPseudopanurgus (Heterosarus) parvus"\
| nomer append --include-header discoverlife\
| mlr --itsv --omd cat
providedExternalId | providedName | relationName | resolvedExternalId | resolvedName | resolvedAuthorship | resolvedRank | resolvedCommonNames | resolvedPath | resolvedPathIds | resolvedPathNames | resolvedPathAuthorships | resolvedExternalUrl |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Pseudopanurgus (Heterosarus) parvus | SYNONYM_OF | https://www.discoverlife.org/mp/20q?search=Protandrena+parva | Protandrena parva | (Robertson, 1892) | species | Animalia | Arthropoda | Insecta | Hymenoptera | Andrenidae | Protandrena parva | https://www.discoverlife.org/mp/20q?search=Animalia | https://www.discoverlife.org/mp/20q?search=Arthropoda | https://www.discoverlife.org/mp/20q?search=Insecta | https://www.discoverlife.org/mp/20q?search=Hymenoptera | https://www.discoverlife.org/mp/20q?search=Andrenidae | https://www.discoverlife.org/mp/20q?search=Protandrena+parva | kingdom | phylum | class | order | family | species | https://www.discoverlife.org/mp/20q?search=Protandrena+parva |
In addition to Pseudopanurgus parvus, the names Heterosarus helianthi and Heterosarus bakeri now also matched to their accepted name.
Not so for Lasioglossum nymphaerum though. Working on that next.
I was able to find
Lasioglossum nymphale (Smith, 1853)
But not Lasioglossum nymphaerum
Also, no hits for nymphaerum .
Which is consistent with @jtmiller28 observation that
Searching https://www.discoverlife.org/ with Lasioglossum nymphaerum yields just the Lasioglossum genus. Backtracking from their final decision made name we can see however that Lasioglossum nymphaerum is synonymous with Lasioglossum oceanicum. A similar scenario to this seems present for Andrena californica, though note the one that is actually searchable on DL is Andrena californica wickhami.
@jtmiller28 Can you please provide some evidence to suggest that Lasioglossum nymphaerum is documented somewhere in DiscoverLife ? If not, would it be possible that they have yet to add the name to the checklist?
(see screenshots below)
It appears that https://www.discoverlife.org/mp/20q?search=Lasioglossum+oceanicum contains
Lasioglossum (Dialictus) nymphaearum (Robertson, 1895),
But not Lasioglossum nymphaerum
So now the question is - is this a typo, and if so, who made the typo?
I've just release v0.4.11 with the aspiring fix. Please verify.
By the way, after re-running the name alignment for https://github.com/jhpoelen/chesshires/actions/runs/4395278727 with v0.4.11 , the following result is found:
providedName | alignRelation | alignedCatalogName | alignedExternalId | alignedName |
---|---|---|---|---|
Heterosarus bakeri | SYNONYM_OF | discoverlife | https://www.discoverlife.org/mp/20q?search=Protandrena+bakeri | Protandrena bakeri |
Heterosarus bakeri | SYNONYM_OF | itis | http://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=753406 | Pseudopanurgus bakeri |
Heterosarus bakeri | SYNONYM_OF | ncbi | https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=625948 | Pseudopanurgus bakeri |
Heterosarus helianthi | SYNONYM_OF | discoverlife | https://www.discoverlife.org/mp/20q?search=Protandrena+helianthi | Protandrena helianthi |
Heterosarus helianthi | SYNONYM_OF | itis | http://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=753448 | Pseudopanurgus helianthi |
Heterosarus helianthi | NONE | ncbi | Heterosarus helianthi | |
Lasioglossum nymphaerum | NONE | discoverlife | Lasioglossum nymphaerum | |
Lasioglossum nymphaerum | NONE | itis | Lasioglossum nymphaerum | |
Lasioglossum nymphaerum | NONE | ncbi | Lasioglossum nymphaerum | |
Pseudopanurgus (Heterosarus) parvus | SYNONYM_OF | discoverlife | https://www.discoverlife.org/mp/20q?search=Protandrena+parva | Protandrena parva |
Pseudopanurgus (Heterosarus) parvus | HAS_ACCEPTED_NAME | itis | http://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=753479 | Pseudopanurgus parvus |
Pseudopanurgus (Heterosarus) parvus | NONE | ncbi | Pseudopanurgus parvus | |
Pseudopanurgus parvus | SYNONYM_OF | discoverlife | https://www.discoverlife.org/mp/20q?search=Protandrena+parva | Protandrena parva |
Pseudopanurgus parvus | HAS_ACCEPTED_NAME | itis | http://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=753479 | Pseudopanurgus parvus |
Pseudopanurgus parvus | NONE | ncbi | Pseudopanurgus parvus |
Agreed, on identifying URLs would be helpful. I have a problem with this however, I cant seem to produce correct linkages when searching discoverlife, its always mapping to https://www.discoverlife.org/mp/20q regardless of the page im on within the site. Is there a trick to this? I've tried it on both firefox & chrome. Will continue the issue thread of the 41 mapping issues after nomer update 0.4.11 (Update fixed >350 names!)
@jtmiller28 thanks for your reply. Can you please provide explicit steps with explicit examples to reproduce the issue you describe in:
I cant seem to produce correct linkages when searching discoverlife, its always mapping to https://www.discoverlife.org/mp/20q regardless of the page im on within the site.
Sure thing, When trying to navigate DiscoverLife's site I start on their landing page: https://www.discoverlife.org/ I then enter the name in question, ex: Pseudopanurgus parvus: Searching this name brings me to the page showing the species information, however; the URL link found in the top search is not a link that I can copy to help others navigate to that said page. https://www.discoverlife.org/mp/20q <- is the link. Using that as your url will bring you to the following page: Which is an uninformative page concerning that actual address of what I was trying to share. Example case was done in Firefox browser
@jtmiller28 thanks for your specific example. I think I understand your desire and reported a separate issue at https://github.com/globalbioticinteractions/nomer/issues/150 . Can you please check whether the issue title makes sense?
Aside from this important navigation / page reference issue, please let me know if there's additional things that need attention as far as this issue (i.e., https://github.com/globalbioticinteractions/nomer/issues/149) goes. If not, please let me know and/or close this issue.
Yep that issue pretty much covers it.
Back to #149, Here are some other cases I find where they aligned names that Nomer did not:
nomer version
yields
0.4.11
txt file of failed to align names if of interest:
nomer-v0.4.11-nonmatches.txt
echo -e "\tAndrena illinoensis bicolor" | nomer append discoverlife
yields
Andrena illinoensis bicolor NONE Andrena illinoensis bicolor Searching for the name manually on discoverlife will bring you to the following page: Which doesn't yield a sufficient path for resolution. Manually going to the name that Chesshire notes is the final resolution "Andrena nigrae" however does denote a suspected homonym of Andrena illnoesis bicolor. Which notably has some oddness to the linking: Andrena illinoensis form bicolor_homonym Robertson, 1898 Not sure how homonyms are dealt with in Nomer/DL indexing, but possibly "form" messing with it?
echo -e "\tPseudopanurgus fraterculus" | nomer append discoverlife
yields
Pseudopanurgus fraterculus NONE Pseudopanurgus fraterculus Not sure what to note here, but the page is rather sparse so maybe something necessary for Nomer to index is missing here?
var. vs var causes failure in alignment (probably something to note, rather than change. I believe this may of came up in a previous issue, but basically when verbatim names have punctuation or unrecongized abbreviations it will trip alignment (subsp -> spp.)
ex.
echo -e "\tProsopis georgica var. leana" | nomer append discoverlife
yields
Prosopis georgica var. leana NONE Prosopis georgica var. leana
however,
echo -e "\tProsopis georgica var leana" | nomer append discoverlife
yields
Prosopis georgica var leana SYNONYM_OF https://www.discoverlife.org/mp/20q?search=Hylaeus+georgicus Hylaeus georgicus (Cockerell, 1896) species Animalia | Arthropoda | Insecta | Hymenoptera | Colletidae | Hylaeus georgicus https://www.discoverlife.org/mp/20q?search=Animalia | https://www.discoverlife.org/mp/20q?search=Arthropoda | https://www.discoverlife.org/mp/20q?search=Insecta | https://www.discoverlife.org/mp/20q?search=Hymenoptera | https://www.discoverlife.org/mp/20q?search=Colletidae | https://www.discoverlife.org/mp/20q?search=Hylaeus+georgicus kingdom | phylum | class | order | family | species https://www.discoverlife.org/mp/20q?search=Hylaeus+georgicus
infraspecific Epithet in combination without subgenus added may cause issues in resolution
ex. echo -e "\tMelissodes atripes atrimitra" | nomer append discoverlife
yields: Melissodes atripes atrimitra NONE Melissodes atripes atrimitra
Looking at Chesshire's final resolution Svastra atripes, we note that this name is a synonym but also has a subgenus in combination with the infraspecificEpithet
echo -e "\tMelissodes (Epimelissodes) atripes atrimitra" | nomer append discoverlife
yields
Melissodes (Epimelissodes) atripes atrimitra SYNONYM_OF https://www.discoverlife.org/mp/20q?search=Svastra+atripes Svastra atripes (Cresson, 1872) species Animalia | Arthropoda | Insecta | Hymenoptera | Apidae | Svastra atripes https://www.discoverlife.org/mp/20q?search=Animalia | https://www.discoverlife.org/mp/20q?search=Arthropoda | https://www.discoverlife.org/mp/20q?search=Insecta | https://www.discoverlife.org/mp/20q?search=Hymenoptera | https://www.discoverlife.org/mp/20q?search=Apidae | https://www.discoverlife.org/mp/20q?search=Svastra+atripes kingdom | phylum | class | order | family | species https://www.discoverlife.org/mp/20q?search=Svastra+atripes
Andrena californica wickhami is another case of this.
echo -e "\tPerdita texana" | nomer append discoverlife
yields
Perdita texana NONE Perdita texana As noted in Chesshire:
However, just including infraspecificEpithet is insufficient presumably due to (4).
echo -e "\tPerdita texana ablusa" | nomer append discoverlife
yields
Perdita texana ablusa NONE Perdita texana ablusa
Finally
echo -e "\tPerdita (Macrotera) texana ablusa" | nomer append discoverlife
yields
Perdita (Macrotera) texana ablusa SYNONYM_OF https://www.discoverlife.org/mp/20q?search=Macrotera+texana Macrotera texana Cresson, 1878 species Animalia | Arthropoda | Insecta | Hymenoptera | Andrenidae | Macrotera texana https://www.discoverlife.org/mp/20q?search=Animalia | https://www.discoverlife.org/mp/20q?search=Arthropoda | https://www.discoverlife.org/mp/20q?search=Insecta | https://www.discoverlife.org/mp/20q?search=Hymenoptera | https://www.discoverlife.org/mp/20q?search=Andrenidae | https://www.discoverlife.org/mp/20q?search=Macrotera+texana kingdom | phylum | class | order | family | species https://www.discoverlife.org/mp/20q?search=Macrotera+texana
echo -e "\tNeolarra congregata" | nomer append discoverlife
yields
Neolarra congregata NONE Neolarra congregata
DL website search shows infraspecific epithet is necessary for alignment, but notably lacks subgenus designation
echo -e "\tNeolarra congregata helianthi" | nomer append discoverlife
yields
Neolarra congregata helianthi SYNONYM_OF https://www.discoverlife.org/mp/20q?search=Neolarra+verbesinae Neolarra verbesinae (Cockerell, 1895) species Animalia | Arthropoda | Insecta | Hymenoptera | Apidae | Neolarra verbesinae https://www.discoverlife.org/mp/20q?search=Animalia | https://www.discoverlife.org/mp/20q?search=Arthropoda | https://www.discoverlife.org/mp/20q?search=Insecta | https://www.discoverlife.org/mp/20q?search=Hymenoptera | https://www.discoverlife.org/mp/20q?search=Apidae | https://www.discoverlife.org/mp/20q?search=Neolarra+verbesinae kingdom | phylum | class | order | family | species https://www.discoverlife.org/mp/20q?search=Neolarra+verbesinaeThose are the cases I've found overall at the moment. Notably 4,5,6 seem to be related issues. Hard to say if there is a "good" resolution for it, from my research experience occurrence data is hit or miss on inclusion of infraspecific epithets and subgenera. Technically, its not supposed to be included at all by taxonomic standards I believe, but has been fashioned in based upon expert taxonomist decisions (which is a rather subjective area that will lead to endless issues if we pursue in my opinion). This poses potential problems for nomer in workflow, considering parsing steps for aligning names removes the subgenus (). Infraspecific Epithets are maintained at least for the first round in my heirarchical use of Nomer alignment, however, lacking subgenus causes failure in resolution in 4 & 5.
@jtmiller28 thanks for preparing the list of example related to the discoverlife name matches. As far as I can tell, the name mismatches stem from interpretation of taxonomic name structures on parsing the discoverlife name lists. I wish there was a way you could help tweak these parsing rules and tune them appropriately. This way, you wouldn't have to wait for folks like me to help nail down these important details.
How do you propose to succeed?
@jtmiller28 please feel free to comment / re-open issue related to Nomer's support for DiscoverLife. Note that the upcoming release is going to have some improvements such as #161 #167 .
A new paper published "Completeness analysis for over 3000 United States bee species identifies persistent data gap" Chesshire et al 2023, reveals some names that are currently on DL's website but unavailable via Nomer currently.
Example:
echo -e "\tAncylandrena atoposoma"| nomer append discoverlife
Yields: Ancylandrena atoposoma NONE Ancylandrena atoposomaThis is a known name according to the DL website: https://www.discoverlife.org/mp/20q
Also, to assure newest version of Nomer:
nomer version
yields: 0.4.9My question is whether another pull from the DL name list is necessary to align current names?
As a side note: The authors of Chesshire et al 2023 went through the names of all United States bees to correct them via expert designations. They provide a file chesshires-name-list.xlsx that shows all original names pulled from their occurrence data aggregates (GBIF and SCAN) and their corrections via name alignment + ending corrections + final names after correction. This might be a great list to add for United States bees with their permission? It might also be a great way to tackle fuzzy names without implementing a character replacement algorithm into Nomer, as they provide names that have known incorrect spellings from aggregators like GBIF and their final resolution mapping.