CatalogueOfLife / backend

Complete backend of COL ChecklistBank
Apache License 2.0
15 stars 11 forks source link

Interpretation of Nomenclatural Status in CLB #1181

Open yroskov opened 1 year ago

yroskov commented 1 year ago

Something wrong with interpretation of Nomenclatural Status in the CoL. For example, Allocapnia curiosa Frison, 1942 (as thousands of other valid names) is interpreted as "potentially valid" whereas expected to be "valid" (https://www.checklistbank.org/catalogue/3/dataset/1065/taxon/89137).

image

Source data as they exported from TaxonWorks:

image

yroskov commented 1 year ago

Comment from @gdower: CLB should translate NOMEN_0000224 to NULL.

mdoering commented 1 year ago

NOMEN 224 stands for ICZN valid. "potentially valid" is the zoological label of ACCEPTABLE which is the "best" nomenclatural status entry in our enumeration and equal to legitimate names in botany:

  /**
   * Botany: Names that are validly published and legitimate
   * Zoology: Available name and potentially valid, i.e. not otherwise invalid
   * for any other objective reason, such as being a junior homonym.
   */
  ACCEPTABLE(ESTABLISHED, "nomen legitimum", null, "potentially valid"),

The problem with the zoological valid name status is that this depends on the subjective, taxonomic opinion. A valid name can only be chosen in the light of the genus placement. Hence we only mark these names as potentially valid:

https://en.wikipedia.org/wiki/Valid_name_(zoology)#Subjectively_invalid_names

I agree this is unfortunate. But I fail to see how we can treat this better when separating names and taxa. I sometimes feel this whole distinction just causes problems.

I guess we could add a higher valid entry and apply that to all potentially valid names which are also accepted, i.e. are linked to a Taxon not Synonym. Translating NOMEN_0000224 to NULL is just hiding problems.

It all boils down to the difference between codes which do / do not require new combinations to be published.

yroskov commented 1 year ago

Thank you, @mdoering.

@proceps, could you please have a look on the issue. What is your opinion as IZN commissioner and TW developer?

mdoering commented 1 year ago

@proceps I wonder how TW deals with this. AFAIK you also treat names independently of the taxonomy. How can you then assign the valid status to them?

proceps commented 1 year ago

I do not know where the status comes from. TW does not have any statuses on Allocapnia curiosa. So this should be introduced during migration to CoL. @gdower should be able to comment on this. Anyway, it looks like NOMEN_0000224 is properly assigned to the name. The question is, I do not understand why "Valid" (Nomen_0000224) in TW is translated to "potentially valid" in CoL? "Valid" in zoology is an equivalent of "Correct name" in botany. "Available" in zoology is an equivalent of "Validly published name" in botany. May be the difference of the use of term 'valid' in zoology and botany is the source of confusion.

In TW, valid status is calculated:

  1. The name is valid by default.
  2. The name is invalid if it has synonym relationship
  3. The name is unavailable if it has unavailable nomenclatural status (nomen nudum, e.g.)
  4. If the name has a synonym status, but the name is reinstated as valid, we can specifically assign "valid" status to the name, and this will overwrite the synonym (synonyms stays in the DB for historical reference).
gdower commented 1 year ago

The potentially valid status comes from the ChecklistBank importer which is ChecklistBank's interpretation of NOMEN to a simplified/unified status. I suggested changing the interpretation of NOMEN_0000224 to NULL because it doesn't quite match any of the ChecklistBank simplified nomenclatural statuses and it is confusing taxonomic experts. In my opinion, the ChecklistBank importer should just take the verbatim NOMEN statuses (e.g., ICZN valid for NOMEN_0000224) as the name statuses because they are higher resolution data than the simplified statuses of ChecklistBank.

What is the purpose of having these simplified name statuses? Who are the simplified name statuses intended for? The general public probably does not need nomenclatural statuses and experts probably want the actual nomenclatural statuses, so it seems like the simplified statuses might not meet the needs of most of our users. If unified name statuses are needed for building the backbone, then could there be a separate interpreted column that attempts to unify statuses and a more detailed column that gives the actual nomenclatural status? That might also provide a way to improve the interpretations over time based on feedback from our community. We could also meet the needs of nomenclators by providing the full nomenclatural statuses. From my perspective, it's always difficult trying to map name statuses into 9 simplified categories that don't quite semantically fit the data. I prefer working with NOMEN because I can just set the name status to what the name status is actually supposed to be, but the NOMEN statuses are being simplified by ChecklistBank to the point of confusing taxonomic experts.

mdoering commented 1 year ago

This needs a longer conversation, but I agree it is confusing as it is and I never was very happy with the current state.

On the other hand I still like the goal of achieving a unified broad view on the status of a name that works across codes. Some are rather alike (available + validly published). But valid + correct are terms that only apply to a name in the context of a given taxonomic opinion (i.e. accepted/synonym). Would IPNI say a name is correct or ZooBank it is valid? They can only assert they are potentially valid, i.e. all objective nomenclatural rules have passed. But since we mostly use names as part of name usages I suppose we could just add a new entry for valid/correct names and use them if the usage is an accepted taxon. Or indeed switch to a much more fine grained vocabulary like NOMEN that separates between codes. I dont think I would want to reuse NOMEN values exactly, but we could at least make sure all name status values map 1:1. NOMEN 0000388 for nothotaxon I am not so sure about.

proceps commented 1 year ago

IPNI and ZooBank are nomenclators, they only asses if the name is validly published / available. CoL, GBIF and all data providers work with taxononomy. They take synonymy into account to point each string to correct / valid taxon name. Potentially valid is not a good term, nomenclators only say that the name is published according to the rules of nomenclature and could be used for taxonomy. NOMEN identifiers are definitely not intended as nomenclatural statuses. But they are used as references to verbal values. I know that many people did attempts to unify the Codes, but I am not completely convinced it is possible. "Accepted" name should be good enough to unify both ICZN and ICN (valid and correct). In the check lists, we do not have that many names without taxonomic status (valid or synonym in case of ICZN), probably only nomina dubia, which are usually a tiny fraction in any given check list submitted to CoL.

mdoering commented 1 year ago

I agree. If we would not have separated names from taxa in the data model this would not have happened. Potentially valid is at least in use: https://en.wikipedia.org/wiki/Valid_name_(zoology)#Subjectively_invalid_names

For now I would simply add a new value for valid/correct names and make sure we use that for all accepted taxa. Although important it has great potential for breaking things, so I will keep that for 2023 I am afraid.

mdoering commented 1 year ago

As an interims solution we might simple change the label "potentially valid" to "valid"?

yroskov commented 1 year ago

we might simple change the label "potentially valid" to "valid"?

It will be correct status for accepted names in the CoL.

In long term, nomenclatural statuses should be Code related up to the time, when united "biocode" will be introduced by commissioners.

mdoering commented 1 year ago

This is still an open and important issue. Reading the comments again I am not sure the main problem is clear. We have a separate status for a nomenclatural (binomial) name (nomStatus) and the taxonomic use of that name (taxStatus). That did also exist in the previous COL ACEF model (GSDNameStatus & Sp2000NameStatus) and is also in DwC (dwc:nomenclaturalStatus & dwc:taxonomicStatus).

Calling a name valid in the zoological sense only makes sense as a taxonomic status. It is not a nomenclatural value. So applying that to the name status in ColDP cannot yield back valid, but instead the interpreter picks the next best status from nomenclature. Here botany and zoology differ:

image

In botanical nomenclature on top of the availability ("validly published") there is also the idea of an illegitimate name which includes homonyms. Fully determining a homonym requires the genus placement, so it is taking up some taxonomic decision. As the botanical code makes genus placements / combinations a nomenclatural governed act this is a proper nomenclatural name status. For zoology secondary homonyms depend on the taxonomy and thus are not an objective, universal name status. Nevertheless there are also nomenclatural rules on top of availability in zoology that can be checked universally for the name, regardless its taxonomy, e.g. primary homonyms, objective synonyms, supressed names. Passing these additional zoological rules corresponds more or less with the legitimate status in botany and some people have called this "potentially valid" in zoology. It is between available and valid, but still a purely objective, nomenclatural status.

I don't know what the best solution here is. I guess we could ignore the legitimate/potentially valid status for zoology completely and just make all those names available? That sounds like the best approach to me if you all believe "potentially valid" is too confusing.

yroskov commented 1 year ago

ignore the legitimate/potentially valid status for zoology completely and just make all those names available?

As for me, it is a best solution. All nomenclature statuses should appear in the CoL in compliance with Code regulations, as (we assume) they present in the GSD.