ArctosDB / arctos

Arctos is a museum collections management system
https://arctos.database.museum
Apache License 2.0
59 stars 13 forks source link

A method for distinguishing homonyms #4794

Open Jegelewicz opened 2 years ago

Jegelewicz commented 2 years ago

A WoRMS aphiaID is a distinguishing feature. And the WoRMS classification includes an LSID based on the aphiaID.

WoRMS distinguishes between homonyms by concatenating the taxon name and the author and giving each a unique aphiaID. If we had a similar capability, each collection could select the correct name without disturbing another collection.

Screen Shot 2022-06-29 at 6 40 48 AM

Originally posted by @sharpphyl in https://github.com/ArctosDB/arctos/issues/4784#issuecomment-1169942433

Jegelewicz commented 2 years ago

I know we have worked hard to keep the taxonomic names list "clean" but it clearly is not at this point. Taxonomy is not clean!

I think what @sharpphyl proposes makes perfect sense, but I'd like to explore a few other ways we might accomplish this.

Could this be accomplished with taxon concepts somehow? But concepts are not associated with classifications, which seems odd? Or maybe I just don't get it.

Jegelewicz commented 2 years ago

What about using other's identifiers? This is something we are going to need to embrace eventually. Just like agents who have ORCiD or Wikidata identifiers, taxon NAMES plus an lsid, WorMS urn, etc. should be more "valuable" and the addition of a unique identifier to a name could allow for multiple versions of the name.

Sure it makes things complex, but taxonomy IS complex! When someone wants to identify something as "Cepolidae" they would also need to add the appropriate identifier to get the classification that is meaningful to their object. This is really no different than the concepts above, BUT it does allow for better use of community curated taxonomy outside of Arctos.

I'm just throwing out ideas here - this is going to be a continual issue if we just keep hoping it will go away and doing nothing about it. Taxonomy sources are not the final answer to this as a collection could easily hold both fish and snails using "Cepolidae"

dustymc commented 2 years ago

but it clearly is not at this point.

Elaborate? (Perhaps by way of https://github.com/ArctosDB/arctos/issues/new?assignees=&labels=Quarantine&template=taxon-name-quarantine-request.md&title=Quarantine+Taxon+Name+-+)

what @sharpphyl proposes

Also needs elaboration; I'm not sure what's being proposed.

But concepts are not associated with classifications

I think classifications are essentially shortcuts to publications, or bodies of them. https://arctos.database.museum/name/Nematomystes%20rodentophilus#Arctos says "this is a nematode, and good luck figuring out what we might mean by that" (which turns out to be enough precision for lots of purposes). https://arctos.database.museum/name/Nematomystes%20rodentophilus#concept_84 says "this is what we mean, and that can't be said in a fancy list." In other words, there might be hundreds of taxon concepts that involve both Nematomystes rodentophilus and nematodes, yet are all different from each other. (Or I believe occasionally use different methods to arrive at the same destination, but try not to think about that too much!) I'm not sure it would have been WRONG to model https://arctos.database.museum/name/Nematomystes%20rodentophilus#concept_84 as a child of https://arctos.database.museum/name/Nematomystes%20rodentophilus#Arctos, but we didn't and I don't think it's really necessary. (It would absolutely result in many more classifications, more complication for no obvious benefit.)

NAMES plus an lsid, WorMS urn, etc.

That sounds like free text with complications. "Sorex5"....

but taxonomy IS complex!

Sorta, but mostly only when there's been some refusal to acknowledge the reality, or the distinctions between taxa and other stuff have been lost (what's being proposed here, I think).

need to add the appropriate identifier to get the classification that is meaningful to their object. Taxonomy sources are not the final answer to this as a collection could easily hold both fish and snails using "Cepolidae"

You're not wrong, but this also seems like one of those situations where we could spend forever trying to find a solution to a theoretical problem that we can't actually solve, probably won't actually ever encounter, and have readily-available workarounds if we do. One of those things is by definition wrong, the ICZN folks will work it out eventually, A {string} IDs offer a 'solution' while we're waiting for the actual system to fix the one thing it exists to prevent.

this is going to be a continual issue

Also elaborate please - what is this issue that we're ignoring?

sharpphyl commented 2 years ago

what @sharpphyl proposes Also needs elaboration; I'm not sure what's being proposed.

I'm just looking around for ideas others use to distinguish homonyms. Since our primary source is WoRMS (via Arctos), it makes sense that we consider their system which is a combination of alphiaID and LSID which control the name and classification. Additionally, they add the author to the taxon name.

We can deal with homonyms for consolidators by adding the LSID field to catalog records (or can we @Jegelewicz?), but it doesn't address the need for different structures and rules for some taxon names. Do any of these ideas have any place in this discussion?

Cepolidae Ihering, 1909 (expand the taxon name definition to include the author) Cepolidae (urn:lsid:marinespecies.org:taxname:994705) (add lsid as part of the "name") Cepolidae (WoRMS AphiaID 994705) (add aphiaID) Cepolidae (https://www.marinespecies.org/aphia.php?p=taxdetails&id=994705) (add WoRMS url) Cepolidae linked to a Concept. There is a Concept attached to Cepolidae but it just reconfirms the Ihering classification. Cepolidae - put both classifications in Arctos and prompt user to select the correct one for their collection. Currently both classifications are in Arctos (which means right now no classification is added to a catalog record). Only one is in WoRMS (via Arctos).

Screen Shot 2022-07-08 at 11 08 51 AM
Jegelewicz commented 2 years ago

See also - agents with the same name.... :-)

Jegelewicz commented 2 years ago

Given that Agent Committee is wrestling with people who have the same name and Taxonomy Committee is wrestling (has been wrestling since I've been associated with Arctos) with taxa that have the same name - it seems like we really need to figure this out!

For back ground see

1803

1840

1852

2007

There are others, but just reading the above will take half a day, The thing is, we never resolve this - it just kinda dies away for a bit and then pops back up. I think the fact that @sharpphyl is unable to publish some data to OBIS because of lacking lsid's related to homonyms in WoRMS should be our wake-up call to figure this out.

Taxonomy Committee discussed this for a long time today, but all of the solutions we have ever proposed have been tossed aside. Who has a fresh idea?

As it stands, we have this:

  1. Pick classifications instead of names
  2. Relax the rules for names to allow for inclusion of author text (Could requiring an identifier, such as an lsid or wikidata Q number when adding a parens or period or other currently disallowed character or capitalization discourage mass creation of nonsense? Could we more tightly control name creation?)
  3. Continue creating a new taxonomy source for every set of possible homonyms (this does not address the fact that WoRMS includes homonyms...)
  4. Use Taxon Concepts (But I don't think this actually does the job...can someone show me how it would address homonyms?)

Have I missed something that has been previously discussed?

Jegelewicz commented 2 years ago

FWIW, current method for disambiguating agents includes adding a parenthetical remark.

dustymc commented 2 years ago

pops back up

AFAIK that has always been entirely theoretical - we haven't solved anything because there's no actual problem.

Use Taxon Concepts (But I don't think this actually does the job...can someone show me how it would address homonyms?)



arctosprod@arctos>> \d identification
                             Table "core.identification"
          Column           |          Type           | Collation | Nullable | Default 
---------------------------+-------------------------+-----------+----------+---------
 identification_id         | integer                 |           | not null | 
...
 taxon_concept_id          | integer                 |           |          | 
....
Foreign-key constraints:
...
    "identification_taxon_concept_id_fkey" FOREIGN KEY (taxon_concept_id) REFERENCES taxon_concept(taxon_concept_id)
``
Jegelewicz commented 1 year ago

I'm closing this - nobody believes homonyms are an issue.

campmlc commented 1 year ago

Based on discussion with CETAF publishing group and soon to be published joint best practices, this will need to be necessary:

Relax the rules for names to allow for inclusion of author text (Could requiring an identifier, such as an lsid or wikidata Q number when adding a parens or period or other currently disallowed character or capitalization discourage mass creation of nonsense? Could we more tightly control name creation?)

Jegelewicz commented 1 year ago

We also discussed today in taxonomy committee and we agree that the "create a new source" method would solve this BUT we also think we need to be able to reliably use author text #3609 and taxon identifiers #4776 to make it robust.

Jegelewicz commented 8 months ago

Closing after discussion in taxonomy committee today.

Jegelewicz commented 1 month ago

Reopening - here is a real-life example of a homonym causing an issue.

https://github.com/ArctosDB/data-migration/issues/1979#issuecomment-2143634494

dustymc commented 1 month ago

I am flagging this as a mystery; I can't understand what you're asking for.

If you want to allow author-bearing strings as taxon names, this would become technically problematic. I know that's "traditional" but tradition (unlike Arctos) doesn't deal with data objects; we have a way to be unambiguous without eg cutting ourselves off from GlobalNames.

If you want to include identifiers in classifications, rock on, awesome.

If you want to allow author-bearing strings as identifications, that's been possible forever, sounds kinda-potentially reasonable to me, maybe there could even be some sort of magic if someone wants to provide details and request it.

If you want to use taxon concepts, those are supported by Arctos and have been for a while.

There's a VERY long discussion about picking names instead of classifications somewhere, that's pretty clearly more than anyone can bear as part of a normal cataloging process, and picking taxon concepts leads to about the same place.

MAYBE only being able to have one homonym in a collection would influence the 'where do I catalog this?' part of https://github.com/ArctosDB/data-migration/issues/1979, but as far as I can tell that's all just speculation at this point. If a single collection truly can't avoid cataloging https://arctos.database.museum/name/Acanthocephala#Arctos and https://arctos.database.museum/name/Acanthocephala#WorldFloraOnline in the same collection, they can easily disambiguate through several of the above methods.