AtlasOfLivingAustralia / data-management

Data management issue tracking
7 stars 0 forks source link

Add WoRMS taxonomy to name lists #530

Open charvolant opened 4 years ago

charvolant commented 4 years ago

Acquire taxonomy from http://www.marinespecies.org/

Mesibov commented 4 years ago

@charvolant, bullet point 4 should be the last.

As a former WoRMS editor I'm aware that WoRMS experts make decisions about names and classification that don't always agree with similar decisions elsewhere. Within the WoRMS universe there's one accepted taxonomy, within the AFD universe there's another, and so on.

How would ALA "resolve any conflicts"? By arbitrarily accepting just one of the taxonomies for all relevant fauna? By getting advice from Australian experts for Australia fauna in WoRMS (with regard to conflicts), and accepting that advice? By offering multiple taxonomies (best solution, hardest to implement)?

charvolant commented 4 years ago

@Mesibov Conflict resolution occurs during the taxonomy merging process documented at https://github.com/AtlasOfLivingAustralia/ala-name-matching/blob/master/doc/large-taxon-collider.md (This is what I, tongue-in-cheek, referred to as "light blue touch-paper and retire to a safe distance") It, essentially, allows the ALA to decide which taxonomy is going to be regarded as most authoritative for a particular part of the taxonomic tree by configuration and which is going to be regarded as secondary/additional.

This process generates a report on decisions/problems, which aids tracking of cases where we need to consult with our sources.

We don't have the option of offering multiple taxonomies for the basic ALA search function. Although it is possible to download raw data and match against a local name index.

Mesibov commented 4 years ago

@charvolant, ta. Two more questions, please:

The CAAB you referred to is Codes for Australian Aquatic Biota? That's not actually a taxonomy reference for fishes, as I understand it redirects to FishBase.

Does ALA make publicly available anywhere the "disagreements" it finds between NSL and other sources, i.e. the report on decisions/problems you refer to?

charvolant commented 4 years ago

@Mesibov

Yes, CAAB is Codes for Australian Aquatic Biota. It does contain a taxonomy which we use to fill some gaps, including a lot of quasi-persistent placeholder names. (We don't include the codes that refer to unanalysed groups of organisms.) More importantly, it contains the standard vernacular names for commercial fish species, which we're obliged to prioritise.

It's possible to find which names have been brought under which umbrella by examining the names tab on the ALA species pages. For example, https://bie.ala.org.au/species/urn:lsid:biodiversity.org.au:afd.taxon:340484bd-33f6-4b46-a63c-751f0b159ed1#names takes Hoplostethus atlanticus from four different sources, along with some notes on provenance if the LTC has made a decision about something with partial information. You can get the priorities assigned to the individual sources by looking at https://bie-ws.ala.org.au/ws/species/urn:lsid:biodiversity.org.au:afd.taxon:340484bd-33f6-4b46-a63c-751f0b159ed1.json and lookling for variants > priority.

We haven't put up the report before, but there's no particular reason not to include it in the https://archives.ala.org.au/archives/nameindexes/ directories when we put up a new names list. Anything that requires major surgery, such as synonym loop breaking, gets added to the taxon's provenance information.

Mesibov commented 4 years ago

@charvolant

Many thanks for the info. The "Names" and "Classification" tabs on taxon name pages are probably good enough, actually, for the cases I had in mind, namely:

(1) A data provider submits a set of records and finds that in processing ALA has replaced the supplied name or classification with a different one. The reason isn't given on the records pages, or in a download of the record set.

or

(2) A data user searches for records under a taxon name not accepted by ALA.

In both cases the client is sent to a page with the ALA-accepted name (e.g. a search for Drimys lanceolata goes to Tasmannia lanceolata: https://bie.ala.org.au/search?q=Drimys+lanceolata). From here the client can explore ALA's accepted nomenclature and classification.

In case (1), is it up to the data provider to check if names or classifications have been changed, or does ALA report back to the data provider with "We've changed these names - please check the ALA names pages for our reasons, and please get in touch with us if you think an error has been made"?

charvolant commented 4 years ago

@Mesibov

In both cases, provided that the name indexes can follow the synonymy, redirection to the now accepted name is automatic.

Individual occurrence records keep their original information. It's possible to compare the original and processed results for an occurrence record via the "Original vs Processed" button towards the top right of the page. See https://biocache.ala.org.au/occurrences/b8d4be28-4abb-4047-b0f2-0af403e1d3e7 for an example.

When we produce a new name index, we don't notify providers beyond a general announcement that there is a new name index. Re-processing uses the original, supplied names and taxonomy and re-matches them against the new name index.

Synonym following is business as usual. However, in cases where the name matching algorithms have had to make an inexact match (to a higher taxon or via soundex detection of misspellings, for example) there are flags associated with the record that can be used to detect possible outliers. Data providers can search for records where these issues are flagged.

charvolant commented 4 years ago

Suggested MoU details, based on ABRS MoU sent to WoRMS for review.

peggynewman commented 3 years ago

Sent email reminder to WoRMS to have a look 5/2/21.