SpeciesFileGroup / taxonworks_doc

TaxonWorks (https://taxonworks.org) documentation.
https://docs.taxonworks.org
13 stars 13 forks source link

Define SOP/best practices for OTUs and subsequent combinations #18

Open mjy opened 4 years ago

mjy commented 4 years ago

OTUs, Taxon Names, and History

Our objective is to address this issue: _If an OTU label was used to pick an OTU and relate it to some data, then when a Protonym moves, the label reflected in the OTU changes how it looks. The new, updated label, may not look like the historical use, what should happen? This is the key issue, what are the steps to resolve these cases.

Basic principles and summary

Background on the TaxonName classification

Side note: How are OTU data summarized?

TaxonNames and OTUs

A TaxonName and an OTU (1:1)

One TaxonName, many OTUs

Many TaxonNames, Many OTUs

!! Which OTU should I use?

We might see:

Citing data linked to OTUs

OTUs and new subsequent combinations

When should I create a new OTU?

What happens when I use the "wrong" OTU?

proceps commented 4 years ago

The best we can do is to provide 3 different OTU labels:

  1. Full label "Tibicin tibicin (Linnaeus, 1758) as Cicada tibicen Linnaeus, 1758" Where the first name is valid name for the taxon as of today. And the second - the original name of protonym or combination"
  2. Short label - valid name
  3. Short label - name in use (original combination or subsequent combination).

So when you pick an OTU from the list, you should be able to see the full name format. In determination, it is probably important to see #3 name in use

In the OTU/Taxon page, it will be #2 valid name

Interactive key, GBIF export etc. - probably #2 as well.

mjy commented 4 years ago

Are you assuming there is only 1 OTU per Protonym?

proceps commented 4 years ago

Not necessarily. But if there are multiples, those should be differentiated by :name (TaxonName: name). That pattern should stay the same. So the full format would be "Tibicin tibicin (Linnaeus, 1758) as Cicada tibicen Linnaeus, 1758: OTU1". Where OTU1 is the name of OTU.

mjy commented 4 years ago

OK- I like the concept of name then and now- that's great, it helps to clarify the concept.

But that's the easy part- you haven't addressed what/when to do with creating OTUs, and updating the name that they point to. See the subsequent combiantion problem.

  if OTU is tied to Aus bus (Jones), a protonym, whose parent is Aus
   and that OTU is used in asserted distributions, determinations, etc.
      and bus (Jones) changes parent to Cus
         then 1) The OTU needs be updated to point to a new Combination (Aus bus (Jones)), which never before existed because it was the current placement.

Yes or no?

proceps commented 4 years ago

Working on it. But in short Protonym: Aus bus (Jones) will have an OTU "Aus bus (Jones) as Bus bus Jones" (attached to protonym, meaning to Original spelling "Bus bus" another OTU "Aus bus (Jones) as Cus bus (Jones)" (attached to subsequent Combination) another OTU "Aus bus (Jones) as Aus bus (Jones)" (attached to another Combination). If the determination is linked to the combination, the last portion of the OTU name (name in use) will never change. The first portion will represent the current name. OTU should never jump from one TaxonName to another.

mjy commented 4 years ago

Side note- pedantically OTUs will change taxon name when they are linked in Error.

Are you saying that OTUs should never be linked to Protonyms except if they are referencing the name in original combination? Our data are not imported that way, many many are linked to the Protonym in current classification.

proceps commented 4 years ago

Exactly, if it is important to preserve the determination string, the OTU should be linked to combination. Unfortunately when migrating data, in most cases it is impossible to make a distinction. The distribution data are just linked to the protonym (3i for example, all subsequent combinations are disregarded). There is no reliable way to say if they are linked to the original combination or subsequent. But when we are talking about new data, this distinction should be clear. In the INHS database, we have a determination as a taxon_name (we do not have any combinations there). The taxon_name was always freely changeable. It never intended to preserve the original determination name, how it was spelt. The determination string is only stored as a determination label. When sanding data to OTU/Taxon page, it does not really matter, all is collected under the valid name with current taxonomical position.

mjy commented 4 years ago

So then we are in agreement. We need functionality to create the subsequenct combinations and update OTUs to point to new taxon names (Combinations) when they are updated, and we know they should be (of course when we don't know there is nothign we can do). We must do this to preserve asserted values.

This issue facilitates this: https://github.com/SpeciesFileGroup/taxonworks/issues/1449.

proceps commented 4 years ago

Well, you have to kwon what name the previous determination was link to (was it original combination or current). In case of INHS collection, you do not know. It does not have any combinations (not original, not subsequent). Even if you assume, that OTUs are always linked to current placement, it is much safer to run a script, which check if the combination identical to the current placement exists, if not create this combination, and move the data from the protonym to combination (unless current and original combinations are the same). Otherwise, once you start moving manually, you lose the track what was moved and where and what is not. The interface #1449 will be very confusing for the people which originally linked the data to protonym original combination or combinations. So if it is implemented, I would like to have a project preference. So it is completely hidden in my project, so the data are not moved around just by mistake.

mjy commented 4 years ago

Of course the curator must make a decision based on their understanding of their data, that's the point. The understanding is that I'm moving this, and that some of my past records must move. Knowing which ones to move is upto the curator.

More importantly, we must provide instructions for what to do moving forward. Clearly, if people are curating from the literature, we need to change the interfaces (if you don't like #1449) so that curators can not pick names that don't reflect the Combinations (capital C) that they are observing. Letting them link to the current classification, in your view point, for new, understood data, is not OK. This requires a major set of work, and thought. We need to come up with the best practices now.

I am arguing that it more or less has to be OK to let them do this (change reference for an OTU from Protonym to Combination, and prompt them for updates when the concept evolves (as it is based on new combiantions). OTUs give us the flexibility to keep content with them, and let the names change, recording their history.