CatalogueOfLife / data

Repository for COL content
8 stars 2 forks source link

Ambiguous synonym descriptor seems to be one-way #406

Open Australis86 opened 2 years ago

Australis86 commented 2 years ago

Describe the problem: If I search for taxon 1 (e.g. Hedyosmum arborescens) and it is an ambiguous synonym of taxon 2 (e.g. Hedyosmum grisebachii), the COL website search correctly lists this; e.g.:

However, if I search for taxon 2, it does not show on its page that taxon 1 is ambiguous - there is no delineation between normal or ambiguous.

If I use the namesearch API to get taxon 2 (Hedyosmum grisebachii, taxon ID = 3JZVL) and then the taxon API to get the synonyms, there is no indication in the resultant dataset that taxon 1 (Hedyosmum arborescens) is an ambiguous synonym.

However, if I use the Namesearch API to retrieve taxon 1 (Hedyosmum arborescens), then it correctly return the same three results as per the webpage, showing that it is an ambiguous synonym of taxon 2.

Is this an API issue or a dataset issue? Can we have it so that the classification is consistent and regardless of which of the two taxa is searched for, it shows when it is an ambiguous synonym of the other?

Link to effected CoL webpages: https://www.catalogueoflife.org/data/taxon/3JZVL https://www.catalogueoflife.org/data/taxon/3JZWK

mdoering commented 2 years ago

True, @thomasstjerne we should mark ambiguous synonyms in the taxon pages. I have to admit I am not a fan of using that status for cases like that though. The 2 names Hedyosmum arborescens Griseb. and Hedyosmum arborescens Cordem. ex Baill. are clearly different having a very different authorship. @dhobern @chantalhuijbers I would want to raise this in the taxonomy group and consider to use ambiguous synonym only for pro parte names. And maybe even consider to use pro parte synonym instead.

dhobern commented 2 years ago

@mdoering I believe that this has been fixed in the natural way - i.e. by ensuring that the taxon pages correctly show when a relationship is considered partial synonymy, regardless of which end is viewed. This makes sense, independently of whether or not the status is being correctly applied.

That leaves just an issue of whether we need to clarify expectations around the use of this status. I'm not sure I understand what you wish the taxonomy group to discuss. Does this example from Hedyosmum represent a common pattern that we need to consider more carefully?

mdoering commented 2 years ago

Thanks. Yes, the definition and expected use of the status ambiguous synonym is what I am after. Unless we agree already on a clear definition it might be worth throwing this out for discussion in the taxonomy group. On the portal we have the following definition:

A name that has been used to refer to more than one possible species

The API has the following definition:

Names which are ambiguous because they point at the current species and one or more others e.g. homonyms, pro-parte synonyms (in other words, names which appear more than in one place in the Catalogue).

I think the question mostly boils down to what is a name. Does it include the authorship to be considered the same or not? My understanding has been we use this status for pro parte synonyms, i.e. names (with authorship) pointing to multiple accepted names, to warn users. If that is the case we can should be able to determine the status automatically even. The Hedyosmum arborescens case given above is not a pro parte name.

Looking at the current ambiguous synonyms in COL:

@dhobern, @yroskov if we can agree on the stricter and more precise pro parte definition I think we don't need any further discussion. But I sense from the examples above it is not that clear?

Side note - calling it checklist status in the glossary is also worth debating. This is a term we don't use anywhere in the API, ColDP or UIs. There we usually just have "status" in the context of a Name or a Taxon/Synonym. In DwC it is taxonomicStatus and nomenclaturalStatus. In the ColDP docs we also use nomenclatural or taxonomic status:

status: is the taxonomic name usage status which includes Synonym.status and the Taxon.provisional flag. A provisional taxon should be listed as provisionally accepted. Unresolved names which are neither accepted nor synonyms can be listed with status=bare name in which case only the Name properties are relevant. This corresponds to a lone Name record without a Taxon or Synonym record.

And should we not better reuse the API definitions dynamically in the glossary instead of maintaining a different definitions there?

yroskov commented 2 years ago

I think the question mostly boils down to what is a name. Does it include the authorship to be considered the same or not?

According to the Codes, an authorship isn't a part of scientific name ;) . A format of authorstring (author name(s), delimiters, year, abbreviated citation, nomenclatural comment) mainly defines by editorial practices, but not by the Codes.

ICZN (species):
5.1. Names of species The scientific name of a species, and not of a taxon of any other rank, is a combination of two names (a binomen), the first being the generic name and the second being the specific name. The generic name must begin with an upper-case letter and the specific name must begin with a lower-case letter. 51.1. Optional use of names of authors The name of the author does not form part of the name of a taxon and its citation is optional, although customary and often advisable.

Botanical Code (species): 6.7. The name of a taxon below the rank of genus, consisting of the name of a genus combined with one or two epithets, is termed a combination (see Art. 21, 23, and 24). 23.1. The name of a species is a binary combination consisting of the name of the genus followed by a single specific epithet...

Taxon author is a part of citation: see CITATION, SECTION 1 AUTHOR CITATIONS https://www.iapt-taxon.org/nomen/pages/main/art_46.html

yroskov commented 2 years ago

CoL Ambiguous Synonym status is not equal to pro parte synonyms. It includes homonyms, pro-parte synonyms, as well as possible [unresolved yet] mistakes in the source checklist. This status is a warning flag for CoL users who are not taxonomists and don't paying attention to authorstrings, nomenclatural statuses and abbreviated comments usually used in authorstrings.

Small set of generalized CoL/checklist statuses is an essential part of the CoL integrity.

mdoering commented 2 years ago

@yroskov I feel we are conflating two very different things then:

  1. ambiguous names (with the same or different author) that exist 2 or more times in the checklist with any status incl. accepted and provisionally accepted
  2. unresolved status. Similar to the provisionally accepted status, just for synonyms.

I would propose to use a distinct status for the 2 situations. In fact we could separate out the boolean provisional flag completely and make that a separate property that applies to any of the remaining statuses: accepted, synonym, ambiguous synonym, misapplied. That would result in a rather clean list of pure statuses, I have never seen ambiguous synonym being used outside of the COL checklist.

dhobern commented 2 years ago

This seems sensible to me. The significance of the two situations for a user is very different.

yroskov commented 2 years ago

I have no objection for distinct statuses in these 2 situations in the case of extended catalogue. All names, which you add programmatically outside and, especially, inside GSDs, should get status "Unresolved".

Who and how will separate these statuses in the case where "duplicated" names appear inside the GSD? Only GSD authors/taxonomists can do it professionally. Are they ready to follow your proposal? Not sure. They are building their checklists follow own aims and rules.

mdoering commented 2 years ago

If there are duplicates the ambiguous status would be falling under the first category which remains as it was. The only situation we are considering to be changed is for all the non duplicate names that currently have the ambiguous synonym status. E.g. Abelmoschus moschatus subsp. tuberosus. When I look into the workbench there is an editorial decision to apply the ambiguous status, but there is no other name like that, neither in COL nor World Plants. Is this just an error stemming from an older version when there maybe have been more copies?

If your definition of ambiguous synonym always requires at least 2 copies of a name there is no need for the unresolved case. And we could actually add some checks that look for outdated decisions applying ambiguous to just one copy.

yroskov commented 2 years ago

Yes, ambiguous synonym always requires at least 2 copies of a name.

If a single name has ambiguous synonym status in CoL, it might be caused by two cases:

(1) old decision from previous version incorrectly stays in CLB after duplicated name been resolved in GSD. Abelmoschus moschatus subsp. tuberosus is exactly the case: http://www.catalogueoflife.org/annual-checklist/2019/search/all/key/Abelmoschus+moschatus+tuberosus/fossil/1/match/1

Solution: It would be nice, if CLB reported cases of "ambiguous synonym" with a single name as a broken decision. (We already have report on broken decisions in Tasks, but we need to have a separate report with these cases).

(2) decision was applied inside whole dataset, whereas CoL takes only part of bigger dataset (ITIS, WCVP).

Solution: It would be nice, if CLB generates TASKS reports inside the project for only sectors included in the CoL, but not for entire GSD

mdoering commented 2 years ago

In that case lets stick with what we got and provide better reports on the project as you say. To spot outdated decisions on single names: https://github.com/CatalogueOfLife/checklistbank/issues/1333

Decisions based on the entire source dataset which are only copied in parts to the project should show up through the outdated decisions above already. To work on the project as a whole you can already use the duplicate tool and project wide tasks. It is a bit slow cause they are large, but work fine and seem to need some attention: https://www.checklistbank.org/catalogue/3/tasks

mdoering commented 2 years ago
image

Quite a few overlaps between sectors in accepted genera & species

yroskov commented 2 years ago

So far, the most interesting issue is a true overlap which I introduced with recent update of WCVP.

@mdoering, it's strange: there is an obvious duplication of species lists in two plant genera Hydrocotyle and Trachymene in the species list. But there are no these genera in a list of duplicated genera. Why?