globalbioticinteractions / nomer

maps identifiers and names to other identifiers and names
GNU General Public License v3.0
19 stars 3 forks source link

expose taxonomic hierarchy inference made by GBIF parser #137

Closed jhpoelen closed 1 year ago

jhpoelen commented 1 year ago

GBIF parser infers taxonomic hierarchy by the structure of a scientific name when possible.

E.g.,

Andrena (Aenandrena) aeneiventris Morawitz, 1872

likely refers a taxonomic name by Morawitz, 1872 of genus Andrena, with subgenus (or infragenericEpithet), and specificEpithet aeneiventris.

Suggest to extend Nomer to include these taxon hierarchy inferences in gbif-parse matcher.

jhpoelen commented 1 year ago

related to #134

@seltmann @jtmiller28 Would you say that infragenericEpithet is equivalent to subgenus ?

jhpoelen commented 1 year ago

previously,

echo -e "\tAndrena (Aenandrena) aeneiventris Morawitz, 1872"\
 | nomer append gbif-parse

yielded

providedExternalId providedName relationName resolvedExternalId resolvedName resolvedAuthorship resolvedRank resolvedCommonNames resolvedPath resolvedPathIds resolvedPathNames resolvedPathAuthorships resolvedExternalUrl
Andrena (Aenandrena) aeneiventris Morawitz, 1872 SAME_AS Andrena aeneiventris Morawitz, 1872

but now, the results are:

providedExternalId providedName relationName resolvedExternalId resolvedName resolvedAuthorship resolvedRank resolvedCommonNames resolvedPath resolvedPathIds resolvedPathNames resolvedPathAuthorships resolvedExternalUrl
Andrena (Aenandrena) aeneiventris Morawitz, 1872 SAME_AS Andrena aeneiventris Morawitz, 1872 species Andrena | Aenandrena | aeneiventris genus | infragenericEpithet | specificEpithet

Note that resolvedRank, rsolvedPath etc. are now populated.

jtmiller28 commented 1 year ago

@jhpoelen Yep I would agree, that would match the infraSpecificEpithet standard in the dwc. @seltmann probably has more experience in this regard though if there is some standard I am unaware of.

jhpoelen commented 1 year ago

@jtmiller28 thanks for your note. Did you see that the GBIF rank description was infragenericEpithet not infraspecificEpithet ? Just making sure that I communicated my question as intended.

jhpoelen commented 1 year ago

Note that DwC has both

https://dwc.tdwg.org/terms/#dwc:subgenus

and

https://dwc.tdwg.org/terms/#dwc:infragenericEpithet

jtmiller28 commented 1 year ago

I knew of the infraSpecificEpithet, but was unaware of the infragenericepithet field being currently in dwc. I was more meaning that it makes sense to me given the current conventions that anything that categorizes genuses without going to specificEpithet field (I.e. subgenus and the like) would be included a field called infragenericepithet.

Does gbif contain this field? I know they follow dwc standards, but presently haven't seen a case where subgenus is correctly parsed in a name.

jhpoelen commented 1 year ago

Closing issue; functionality has been added and made available in Nomer release.