AtlasOfLivingAustralia / ala-name-matching

Atlas name matching API and index generation
Other
10 stars 13 forks source link

Provisional species names badly misparsed #222

Open dhobern opened 2 years ago

dhobern commented 2 years ago

Comment from ABARES State of Forests team:

I’ve come across one issue that would be good to resolve.

Some names are presented with two capitals – one for the genus and one for the species. For example: Eucalyptus Cattai.

When I search for this in ALA it does not return a match – rather it returns: Eucalyptus sp. Cattai OR Eucalyptus sp. Cattai (Gregson s.n. 28 Aug 1954).

I’m wondering if the ‘sp.’ in the middle has been removed. Eucalyptus sp. Cattai (Gregson s.n. 28 Aug 1954) is in fact the whole name of the species. (it’s an inferred taxon treated as unique that will in time be replaced with a ‘proper’ species name once it’s been fully described and accepted.

This can be seen here - see the content of the Species field:

https://biocache.ala.org.au/occurrences/96751ed2-a08b-45ad-badb-3fd7afb16015

Where a species name does not match the "^[A-Z][a-z]+ (([A-Z][a-z]+) )?[a-z]-?[a-z]+( .*)$" pattern (shown with subgenus and completely open-ended authorship), the ALA should use the verbatim scientific name as the Species name string without attempting to parse out a specific epithet and generating a novel binomal from genus+epithet. This would fix cases like "Genus sp. epithet" and anything more messed up.

adam-collins commented 9 months ago

This looks like an APC issue. e.g. https://bie.ala.org.au/species/https://id.biodiversity.org.au/taxon/apni/51289965#names links to the name https://biodiversity.org.au/nsl/services/rest/taxon/apni/51289965 that is recorded with the two capitals, APC: Eucalyptus sp. Cattai (Gregson s.n. 28 Aug 1954)

In case there is something will do, moving this to the names index builder.