ArctosDB / arctos

Arctos is a museum collections management system
https://arctos.database.museum
60 stars 13 forks source link

Fixing Taxon Name mispellings #2645

Closed campmlc closed 2 years ago

campmlc commented 4 years ago

The following taxa have names mispelled in Arctos. Hypsugo alashanicus -> should be Hypsugo alaschanicus Plecotus koslovi -> Plecotus kozlovi As far as I can tell, these are errors generated from within Arctos. External resources and pubs have the correctly spelled name. Some of them (NCBI) have picked up our mispellings citing us as the source. We need a clear process for making these corrections rather than allowing them to be perpetuated as additional variants from within Arctos. There doesn't seem to be a way to flag mispellings currently in Arctos, only to add spelling variants. I think we need to revisit this.

campmlc commented 4 years ago

Also, do we have the means of bulk updating an identification and bulk deleting the existing one, e.g. to add a correctly spelled taxon name and remove the incorrectly spelled name?

Jegelewicz commented 4 years ago

Search for everything identified with the incorrect spelling, then manage Identification in search results?

Jegelewicz commented 4 years ago

Once all usage of the incorrect name are removed, delete the classification, then delete the name.

campmlc commented 4 years ago

Manage identification in search results allows you to add an ID, but not remove existing? I can add the updated name, but until I can delete the incorrect identifications in bulk, the name will remain associated with the specimens and I cannot delete it. What am I missing? This whole process also involves a huge number of steps, each of which is prone to error, and which takes an inordinate amount of time. Not conducive to folks fixing errors as they see them.

dustymc commented 4 years ago

Not conducive to folks fixing errors as they see them.

This type of error should never happen. If your analysis is correct, we introduced an invalid authority and then someone USED it! I'm not enthusiastic about adding an ability to deal with things that should never happen to the UI; I'd rather invest in figuring out how to make our authorities authoritative, and as always Lam and I can help fix data.

That said, I don't believe your analysis is correct in this case. A quick search reveals that http://dx.doi.org/10.1016/j.quaint.2016.09.061 contains "Hypsugo alashanicus" - the name, however "wrong," is useful for people coming from that publication, and therefore is NOT something that should be deleted from Arctos.

Manage identification in search results allows you to add an ID, but not remove existing?

Correct, and I see no safe way to change that. A "debulkloader" is a possible approach.

dustymc commented 4 years ago

Plecotus koslovi is used in Ectoparasites of Bats in Mongolia, Part 2 (Ischnopsyllidae, Nycteribiidae, Cimicidae and Acari) and should also not be deleted.

Jegelewicz commented 4 years ago

Let's make sure these names have the appropriate relationships. I also generally put the correctly spelled name in "preferred name" in the classification metadata of the incorrectly spelled name.

Jegelewicz commented 4 years ago

There is only one specimen with Hypsugo alashanicus -> should be Hypsugo alaschanicus

easy enough to fix?

campmlc commented 4 years ago

To make sure I understand, steps would be 1) clone the incorrect name and classification as a new name and classification. 2) in this case, there is no classification, so something from an external source would need to be cloned into the new name. 3) add any relationships (in this case, I would not want to perpetuate the misspelling, but should we add potential alternate spelling?) 4) add the new ID to the specimen record. 5) Delete the old ID from the specimen record 6) Delete the classification on the old ID, if there is one 7) Delete the name ?

On Wed, May 6, 2020 at 3:13 PM Teresa Mayfield-Meyer < notifications@github.com> wrote:

  • [EXTERNAL]*

There is only one specimen with Hypsugo alashanicus -> should be Hypsugo alaschanicus

easy enough to fix?

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/2645#issuecomment-624894190, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADQ7JBANVYQX3DHBIG57TSLRQHHHZANCNFSM4M2X66IQ .

campmlc commented 4 years ago

So how do we prevent not only students but CMs from using the wrong spellings? We are not all experts in all taxa. Potential alternate spelling doesn't highlight which one is the correct form to be used?

On Wed, May 6, 2020 at 2:52 PM dustymc notifications@github.com wrote:

  • [EXTERNAL]*

Not conducive to folks fixing errors as they see them.

This type of error should never happen. If your analysis is correct, we introduced an invalid authority and then someone USED it! I'm not enthusiastic about adding an ability to deal with things that should never happen to the UI; I'd rather invest in figuring out how to make our authorities authoritative, and as always Lam and I can help fix data.

That said, I don't believe your analysis is correct in this case. A quick search reveals that http://dx.doi.org/10.1016/j.quaint.2016.09.061 contains "Hypsugo alashanicus" - the name, however "wrong," is useful for people coming from that publication, and therefore is NOT something that should be deleted from Arctos.

Manage identification in search results allows you to add an ID, but not remove existing?

Correct, and I see no safe way to change that. A "debulkloader" is a possible approach.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/2645#issuecomment-624884721, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADQ7JBEYS4BGRVWZORAPKALRQHEYNANCNFSM4M2X66IQ .

dustymc commented 4 years ago

1) Somehow create the name, there are many ways to do so. 2) Somehow get a classification if desired, again there are many ways. 3) A failure to add relationships is much more likely to perpetuate errors. Given one thing many will click it; given a pair they may investigate. As @Jegelewicz mentioned preferred name exists to reinforce the path to the "correct" name. 4) yep 5) sure, if you're sure it's irrelevant (eg, not a specimen stealthily cited in one of those publications) 6) please don't 7) PLEASE!!! don't. Doing so would make it much more difficult for a user coming from publications that use it. http://handbook.arctosdb.org/documentation/taxonomy.html

dustymc commented 4 years ago

Potential alternate spelling doesn't highlight which one is the correct form to be used?

Correct, and "correct" pretty commonly flip-flops back and forth over time. http://arctos.database.museum/info/ctDocumentation.cfm?table=CTTAXON_TERM&field=preferred_name exists for this purpose.

campmlc commented 4 years ago

It would help if we had some way, even through text color! to indicate which is the preferred name in the taxon name search results?

dustymc commented 4 years ago
Screen Shot 2020-05-06 at 2 34 12 PM

That was disabled for melting Oracle; it seems fine in PG.

campmlc commented 4 years ago

That's good to hear. For the Plecotus auritus example, which one would I choose? It seems both invalid in Arctos Relationships and valid in Arctos? Note the difficulty under the current scenario of choosing the correct Plecotus koslovi/kozlovi

On Wed, May 6, 2020 at 3:34 PM dustymc notifications@github.com wrote:

  • [EXTERNAL]*

[image: Screen Shot 2020-05-06 at 2 34 12 PM] https://user-images.githubusercontent.com/5720791/81230617-ae4f5800-8fa6-11ea-9cc7-2047065fb8f3.png

That was disabled for melting Oracle; it seems fine in PG.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/2645#issuecomment-624903274, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADQ7JBBDYZGY6RI7BTBKPJTRQHJYDANCNFSM4M2X66IQ .

dustymc commented 4 years ago

which one would I choose?

Probably depends on which taxonomic faction you're closest to at the moment....

both invalid in Arctos Relationships

That should probably be somehow excluded from the options; it's all machine-maintained and contains about everything.

difficulty under the current scenario

All Arctos can do is reflect the reality of taxonomy (when we have the data, which is about never). There's no polish that can be put on a couple hundred years of entirely arbitrary and often self-conflicting literature which will make it all make sense. The two most obvious solutions here would be:

  1. Build your own map outside the (alleged!) formalities of taxonomy. Add a preferred_name of "UseThatOne (MSB:Mamm)" and your students don't have to figure it out.
  2. Make it easy to ignore the parts of taxonomy that you don't care about; maintain your own classification that contains only names that you want to use.
Jegelewicz commented 4 years ago

See https://github.com/ArctosDB/arctos/issues/2125#issuecomment-626927031

Jegelewicz commented 4 years ago

See Callipepela and Callipepla

dustymc commented 4 years ago

See Callipepela and Callipepla

Hu?

dustymc commented 4 years ago

https://arctos.database.museum/name/Buteo%20jamaicensis%20abeiticola seems to be an example of a taxon that should not be in Arctos; Google finds only a dead link to NCBI. Looks like a simple misspelling which hasn't much escaped Arctos.

Jegelewicz commented 4 years ago
See Callipepela and Callipepla

Hu?

An example of a misspelling with a relationship to the correct spelling.

dustymc commented 4 years ago

Do you have evidence it's a misspelling? They're both used in literature going back ~100 years, which is about the limit of my detective ability.

campmlc commented 4 years ago

This emphasizes that we need a way to clearly indicate a preference of name/spelling choice globally in Arctos or by collection, something that students can understand.

On Wed, May 13, 2020 at 8:47 AM Teresa Mayfield-Meyer < notifications@github.com> wrote:

  • [EXTERNAL]*

See Callipepela and Callipepla

Hu?

An example of a misspelling with a relationship to the correct spelling.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/2645#issuecomment-628040401, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADQ7JBEGXDTIZIAG7MDPM7TRRKXI5ANCNFSM4M2X66IQ .

dustymc commented 4 years ago

clearly indicate a preference of name/spelling choice

http://arctos.database.museum/info/ctDocumentation.cfm?table=CTTAXON_TERM&field=preferred_name

anna-chinn commented 4 years ago

Taxonomy Committee 2020-05-20: The consensus from the meeting was that we should delete misspelled names that exist only in Arctos (i.e. not published anywhere else/don't pass the Google test) to avoid confusion on data entry. For misspellings that are published elsewhere, we maintain them in Arctos for the sake of specimen discovery.

For post-PG, an improved flagging system might help us find and rectify errors. When a taxon name is flagged as a possible alternate spelling, provided we have a way to record which taxon name is the correct one, can we set up a bad data notification/report that goes out to all collections that use the taxon? That way collections can update identifications or switch to A{string}. Then, we can delete the misspelled taxon name.

Also, can we grey out metadata boxes for invalid/misspelled taxa that come up in search forms so that students/volunteers have a better idea of which taxon name they should be using?

Screen Shot 2020-05-20 at 6 19 06 PM
dustymc commented 4 years ago

delete misspelled names that exist only in Arctos

Agreed

improved flagging system

I tentatively suggest using https://github.com/ArctosDB/arctos/issues/2499 as a "hard flag" ONLY for names that are clearly misspelled, not valid in some other context, but still useful for discovery - it should be a bit more permanent than anything in classifications, but still easy to recover if that bad spelling of a lizard turns out to be a good spelling of a diatom. https://github.com/ArctosDB/arctos/issues/2125#issuecomment-627475558

set up a bad data notification/report t

Yes.

Then, we can delete

I think that will be very rare, but yes if the name is demonstrably not published anywhere it could be deleted rather than flagged.

I think much more common will be the https://github.com/ArctosDB/arctos/issues/2645#issuecomment-627671539 situation, where apparently everybody's been using both ~forever.

grey out metadata boxes

Hopefully PG will provide the processing power to get at those data, and we can do WHATEVER with the UI (as long as we're not stopping some collection from doing whatever they need to do).

Jegelewicz commented 4 years ago

I made some changes to Callipepela and Callipepla. I think we need to make the "potential alternate spelling" option more explicit somehow? "Like this term is the misspelled version of that one." The mess of creating a relationship that doesn't really say much (possible alternative spelling), marking one as valid and one as invalid then adding the valid name as the "preferred name" in the metadata of the misspelled name is just too convoluted with the additional possibility of misspelling the preferred name because that field is not tied to taxon names....

image

image

campmlc commented 4 years ago

I agree with using some version of " this term is the misspelled version of that one." Also, can we find a way to flag Arctos mispellings to avoid them being published to NCBI or back to Global Names? It is crazy that we can make a mistake and then be the global "authority" for the mistake . . .

On Wed, May 20, 2020 at 4:47 PM Teresa Mayfield-Meyer < notifications@github.com> wrote:

  • [EXTERNAL]*

I made some changes to Callipepela and Callipepla https://arctos.database.museum/name/Callipepla. I think we need to make the "alternate spelling" option more explicit somehow? "Like this term is the misspelled version of that one." The mess of creating a relationship that doesn't really say mugh (possible alternative spelling), marking one as valid and one as invalid then adding the valid name as the "preferred name" in the metadata of the misspelled name is just too convoluted with the additional possibility of misspelling the preferred name because that field is not tied to taxon names....

[image: image] https://user-images.githubusercontent.com/5725767/82504812-82a99300-9ab9-11ea-9d01-0a7fda727cbf.png

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/2645#issuecomment-631768915, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADQ7JBCXOXMQLU7FQA4TYK3RSRMXBANCNFSM4M2X66IQ .

dustymc commented 4 years ago

more explicit somehow?

Yea I'm up for ideas.

possible alternative spelling

See http://arctos.database.museum/info/ctDocumentation.cfm?table=CTTAXON_RELATION - those 100% (assuming everybody reads the docs...) come from my scripts. If you know more, we may need more taxon relationships. There's some huge conversation that lead to one term somewhere....

flag Arctos mispellings to avoid them being published to NCBI or back to Global Names?

https://github.com/ArctosDB/arctos/issues/2645#issuecomment-631764798

sharpphyl commented 4 years ago

@Jegelewicz et al. re: TAXON_RELATIONSHIP descriptions

Currently it reads:

TAXON_RELATIONSHIP | Documentation potential alternate spelling | Possible spelling variation detected by automation. synonym of | Each of two or more names of the same rank used to denote the same taxonomic taxon. - ICZN

suggested modifications to start the discussion...

potential alternate spelling | Possible spelling variation detected by automation and retained to aid users in finding specimens. May be a misspelling. Check Taxon Status in preferred classification for validity.

synonym of | Each of two or more names of the same rank used to denote the same taxonomic entity. Only one synonym should have a taxon status of valid.

We still need to delete misspellings that only exist in Arctos and foul the works for everyone and we need a system to get curators to correct their misspelled identifications so we can do the deletion.

Jegelewicz commented 4 years ago

Do only ICZN entries have synonyms??

Oh heck no! The plant people are just as bad...it is just that the definition came from ICZN.

dustymc commented 4 years ago

Sorting out those definitions should be prioritized; I'm seeing some concerning errors in the logs.

We should be using "synonym" to mean "same thing, different name." The Code implications should be avoided, not added; we mere mortals cannot make that determination, that takes a taxonomist, or perhaps a bunch of them, and sometimes decades.

We had a bunch of Code terms, they did not get used properly, there was agreement to consolidate under our own terminology. If you are really determining that a name is a synonym in the meaning of some Code, then I suggest adding eg "ICZN Synonym" or similar as a CT term.

"potential alternate spelling" should not be used by humans, unless you're really not considering anything except the structure of the name itself.

DerekSikes commented 4 years ago

"we mere mortals cannot make that determination, that takes a taxonomist, or perhaps a bunch of them, and sometimes decades."

But there are tons of invalid synonyms that have been invalid synonyms for a century or longer... if we didn't list such names as invalid synonyms then we'd be asserting a new taxonomic relationship that disagrees with all prior usage.

Or maybe I misunderstand something?

Derek

On Mon, Aug 24, 2020 at 7:05 AM dustymc notifications@github.com wrote:

Sorting out those definitions should be prioritized; I'm seeing some concerning errors in the logs.

We should be using "synonym" to mean "same thing, different name." The Code implications should be avoided, not added; we mere mortals cannot make that determination, that takes a taxonomist, or perhaps a bunch of them, and sometimes decades.

We had a bunch of Code terms, they did not get used properly, there was agreement to consolidate under our own terminology. If you are really determining that a name is a synonym in the meaning of some Code, then I suggest adding eg "ICZN Synonym" or similar as a CT term.

"potential alternate spelling" should not be used by humans, unless you're really not considering anything except the structure of the name itself.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/2645#issuecomment-679182816, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACFNUM73H5UGCRHVMPRDVJLSCJ6SHANCNFSM4M2X66IQ .

--

+++++++++++++++++++++++++++++++++++ Derek S. Sikes, Curator of Insects Professor of Entomology University of Alaska Museum 1962 Yukon Drive Fairbanks, AK 99775-6960

dssikes@alaska.edu

phone: 907-474-6278 FAX: 907-474-5469 he/him/his University of Alaska Museum - search 400,276 digitized arthropod records http://arctos.database.museum/uam_ento_all http://www.uaf.edu/museum/collections/ento/ +++++++++++++++++++++++++++++++++++

Interested in Alaskan Entomology? Join the Alaska Entomological Society and / or sign up for the email listserv "Alaska Entomological Network" at http://www.akentsoc.org/contact_us

dustymc commented 4 years ago

century

I probably won't know that, and won't be able to back it up or use accurate terminology if I did - and I've created most of our relationships.

I'm not arguing that those are ideal, just that they're all I'm likely to know. If you can do better, please do - I'd be thrilled to do whatever I can to support getting more precise data (and part of that will be keeping it from getting mixed up in the imprecise data).

campmlc commented 4 years ago

Why not a relationship of "mispelling of"? - I think we should defer to the taxonomists in our midst and let them sort this out. It is their mess, after all, and they are the professionals trained to deal with it. We certainly don't want to be making things any more difficult than they are by limiting the options for clarifying problems.

On Mon, Aug 24, 2020 at 7:58 PM dustymc notifications@github.com wrote:

  • [EXTERNAL]*

century

I probably won't know that, and won't be able to back it up or use accurate terminology if I did - and I've created most of our relationships.

-

potential alternate spelling ==> computers think these have something to do with each other, take that for what it's worth. These are typically https://arctos.database.museum/name/Microtus%20longicaudus%20alticola / https://arctos.database.museum/name/Microtus%20longicaudus%20alticolus type things - I'm just checking for common spelling variations, one could easily be completely different, or a virus so there's no conflict, or WHATEVER.

synonym of ==> people think these have something to do with each other, that's better than machines but still proceed with caution. These are https://arctos.database.museum/name/Myodes%20gapperi / https://arctos.database.museum/name/Clethrionomys%20gapperi sorts of things, where someone knows there's some debate but doesn't know why.

I'm not arguing that those are ideal, just that they're all I'm likely to know. If you can do better, please do - I'd be thrilled to do whatever I can to support getting more precise data (and part of that will be keeping it from getting mixed up in the imprecise data).

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/2645#issuecomment-679456912, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADQ7JBFZUTJYGUJJQ7GX7UDSCMLFRANCNFSM4M2X66IQ .

DerekSikes commented 4 years ago

We already have a relationship 'potential alternate spelling of' so I don't think we need 'misspelling of'. Simpler is better.

-D

On Mon, Aug 24, 2020 at 8:04 PM Mariel Campbell notifications@github.com wrote:

Why not a relationship of "mispelling of"? - I think we should defer to the taxonomists in our midst and let them sort this out. It is their mess, after all, and they are the professionals trained to deal with it. We certainly don't want to be making things any more difficult than they are by limiting the options for clarifying problems.

On Mon, Aug 24, 2020 at 7:58 PM dustymc notifications@github.com wrote:

  • [EXTERNAL]*

century

I probably won't know that, and won't be able to back it up or use accurate terminology if I did - and I've created most of our relationships.

-

potential alternate spelling ==> computers think these have something to do with each other, take that for what it's worth. These are typically https://arctos.database.museum/name/Microtus%20longicaudus%20alticola / https://arctos.database.museum/name/Microtus%20longicaudus%20alticolus type things - I'm just checking for common spelling variations, one could easily be completely different, or a virus so there's no conflict, or WHATEVER.

synonym of ==> people think these have something to do with each other, that's better than machines but still proceed with caution. These are https://arctos.database.museum/name/Myodes%20gapperi / https://arctos.database.museum/name/Clethrionomys%20gapperi sorts of things, where someone knows there's some debate but doesn't know why.

I'm not arguing that those are ideal, just that they're all I'm likely to know. If you can do better, please do - I'd be thrilled to do whatever I can to support getting more precise data (and part of that will be keeping it from getting mixed up in the imprecise data).

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/2645#issuecomment-679456912, or unsubscribe < https://github.com/notifications/unsubscribe-auth/ADQ7JBFZUTJYGUJJQ7GX7UDSCMLFRANCNFSM4M2X66IQ

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/2645#issuecomment-679563403, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACFNUM7J3T3OUWD2CW46BKTSCMZ35ANCNFSM4M2X66IQ .

--

+++++++++++++++++++++++++++++++++++ Derek S. Sikes, Curator of Insects Professor of Entomology University of Alaska Museum 1962 Yukon Drive Fairbanks, AK 99775-6960

dssikes@alaska.edu

phone: 907-474-6278 FAX: 907-474-5469 he/him/his University of Alaska Museum - search 400,276 digitized arthropod records http://arctos.database.museum/uam_ento_all http://www.uaf.edu/museum/collections/ento/ +++++++++++++++++++++++++++++++++++

Interested in Alaskan Entomology? Join the Alaska Entomological Society and / or sign up for the email listserv "Alaska Entomological Network" at http://www.akentsoc.org/contact_us

sharpphyl commented 4 years ago

Having spent a lot of time lately on taxonomic updates, I agree that our two relationships are often insufficient. Even the synonyms often require that you open multiple taxonomic pages to know which one is valid (if the taxon status field has been completed). With the "potential alternate spelling," again it doesn't say what name is valid or invalid so you may have to search beyond Arctos. "valid synonym of " and "invalid synonym of" and "valid alternate spelling of" and "invalid alternate spelling of" would be helpful for me.

dustymc commented 4 years ago

to know which one is valid

That is not the purpose, and it is why "preferred_name" exists.

Clethrionomys and Myodes seem to flip-flop back and forth every decade or so, and not everyone gets convinced of the winner at the same time, if at all, for example. "Full data" would be dozens of relationships backed by dozens of publications, and still wouldn't tell you what your friendly local curator prefers. Adding all of your suggested vocabulary (or anything else) wouldn't help at all. "Myodes (My Collection)" in preferred name serves that purpose, and leaves plenty of room for disagreement via "Clethrionomys (Some Other Collection)" in preferred_name.

Please see https://arctos.database.museum/info/ctDocumentation.cfm?table=CTTAXON_RELATION - "potential alternate spelling" should come only from machines, and they're not going to have any opinions on validity any time soon.

Note also that taxon_status, even if backed by metadata, does not fulfill this purpose either. "We like Cleth (some author, some date)" isn't likely to be very useful to your data entry folks.

campmlc commented 4 years ago

Yes, but we need to be able to distinguish between two legitimate names that may or may not be considered valid by a particular taxonomist or collection (Clethrionomys and Myodes), and a misspelling which should not be used. I agree that "potential alternate spelling" does not solve this problem. The other alternative - delete the mispelled names from Arctos - keeps getting shot down, and cannot readily be done when any confused students/collections use it. We don't want to perpetuate the misuse. What is the solution - it is obvious we need one and the current alternatives do not work.

On Thu, Sep 24, 2020 at 3:47 PM dustymc notifications@github.com wrote:

  • [EXTERNAL]*

to know which one is valid

That is not the purpose, and it is why "preferred_name" exists.

Clethrionomys and Myodes seem to flip-flop back and forth every decade or so, and not everyone gets convinced of the winner at the same time, if at all, for example. "Full data" would be dozens of relationships backed by dozens of publications, and still wouldn't tell you what your friendly local curator prefers. Adding all of your suggested vocabulary (or anything else) wouldn't help at all. "Myodes (My Collection)" in preferred name serves that purpose, and leaves plenty of room for disagreement via "Clethrionomys (Some Other Collection)" in preferred_name.

Please see https://arctos.database.museum/info/ctDocumentation.cfm?table=CTTAXON_RELATION

  • "potential alternate spelling" should come only from machines, and they're not going to have any opinions on validity any time soon.

Note also that taxon_status, even if backed by metadata, does not fulfill this purpose either. "We like Cleth (some author, some date)" isn't likely to be very useful to your data entry folks.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/2645#issuecomment-698606158, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADQ7JBBARMTERXARREAUXX3SHO45HANCNFSM4M2X66IQ .

dustymc commented 4 years ago

Quarantine is a lossless alternative to deletion. They're functionally identical from specimen's perspective, but quarantined names can do stuff from other angles (lead to specimens from the publications which created the alleged misspellings, for example).

Un-using - for quarantine or deletion, they're functionally identical here too - is largely a social problem, albeit a social problem that's confounded by a system which isn't always capable of providing clear guidance. (That is another problem which quarantine addresses and deletion confounds by avoidance.)

We have an elegant and robust solution to the problem you describe, it just needs used.

Jegelewicz commented 4 years ago

We have an elegant and robust solution to the problem you describe, it just needs used.

Sadly, it will probably only be rarely used because of social issues.

Also, we need a way for HUMANS to designate a misspelling (I can tell you they are doing it already using the thing that @dustymc thinks is only meant for computers).

Part of this issue stems from messy taxonomy, which we CANNOT fix and part from messy collections data, which we can if we are willing to work at it. Additionally, the separation of names from their "preferred alternative" (stored in classification metadata) is a problem since choices of names for identification come from the giant list of names which are available to everyone, everywhere. There is no meaningful way for a collection to prefer one name over another meaning that there is a high probability that Callipepela and Callipepla will be used interchangeably within a collection as long as they are both "Linnean" names, which is likely to be FOREVER given the hurdle to completely eliminate all use of one of them and quarantine it. But maybe not:

image

image

Anyone want to tackle those 20 Callipepela and quarantine it?

AND BTW

delete the misspelled names from Arctos

IS an option if you find they are not in use. However, it may have unintended consequences over time (lower discoverability if someone happens to search on the misspelling) and it can always be added back by the next person to bulkload taxon names. Quarantine is better because it keeps the name from being used in identifications while still providing for discoverability if someone has used it in an A {string} identification or it is spelled that way in a publication.

dustymc commented 4 years ago

need a way for HUMANS to designate a misspelling

That is precisely what "synonym" was intended for - not sure how we got the definition we were trying to avoid. There's a giant issue somewhere....

tackle those 20 Callipepela

Sure, I just need something like "official" go-ahead from the involved collections.


select 
  collection.guid_prefix,
  taxon_name.scientific_name,
  count(*) c
from 
  collection 
  inner join cataloged_item on collection.collection_id=cataloged_item.collection_id
  inner join identification on cataloged_item.collection_object_id=identification.collection_object_id
  inner join identification_taxonomy on identification.identification_id=identification_taxonomy.identification_id
  inner join taxon_name on identification_taxonomy.taxon_name_id=taxon_name.taxon_name_id
where
  taxon_name.scientific_name like 'Callipepela %'
group by
  collection.guid_prefix,
  taxon_name.scientific_name
;
 guid_prefix |   scientific_name    | c 
-------------+----------------------+---
 DGR:Bird    | Callipepela gambeli  | 5
 DGR:Bird    | Callipepela gambelii | 1
 DGR:Bird    | Callipepela squamata | 6
 MSB:Bird    | Callipepela gambeli  | 4
(4 rows)

Merger could probably be fleshed out into a "nominate for quarantine" form. That could have some sort of sign-off component (eg, someone with manage_collection from every involved collection gets emails until they click a box, or something). That would however not deal with unresponsive collections, which I think can only be addressed by The Community - and perhaps doing so will find a global solution, so I think that needs discussed before we contemplate tools very seriously.

Jegelewicz commented 4 years ago

Merger could probably be fleshed out into a "nominate for quarantine" form. That could have some sort of sign-off component (eg, someone with manage_collection from every involved collection gets emails until they click a box, or something).

I like this idea as it helps to automate the process a bit (I don't have to go find emails for all collections involved).

"Sign-off" could lead to something like:

Disagree with quarantine - must provide a reason.

OR

Agree to quarantine - which sets in motion removal from the collection's identifications by

Deletion of offending identifications AND

Identifying with whatever accepted spelling is with whatever was in ID before in ID remark = "verbatim identification = XXXXX"

OR

Creating an identification of A {string} where A=accepted spelling and {string}=whatever was in ID before

Once all use of the offending spelling is eliminated, quarantine the name (automatically would be great, but a notification to whoever nominated it so they can quarantine could work too).

Then we need a way to document this in association with the name so that everyone knows why it is or isn't quarantined.

dustymc commented 4 years ago

Yea I think it could work, but still think we need to discuss what happens if someone's unresponsive first.

If I merge, I'll almost certainly just swap out the taxon_name_id and id_formula; identification.scientific_name would not change.

I think the form, if we go there, would be implemented as a full solution - change IDs, create relationships, add any dialog to relationship - uh, authority I guess, since we don't have remarks - and quarantine the name. Anything else opens up the possibility of someone restarting the process by using the name.

sharpphyl commented 4 years ago

As @Jegelewicz mentioned preferred name exists to reinforce the path to the "correct" name.

This is a good feature that I often use but it's free-form and doesn't actually go to the taxon names table and verify that I didn't add a name not in the table. Can that be done the same way it's done for synonyms?

dustymc commented 4 years ago

Free-form is very much a feature, not a bug; that thing should accept "use THIS for ABC Collection" and "XYZ collection use THAT, but only if collected north of the Blaugh River after 1976" and the other 54 variants which might be in play at any time.

You could enter HTML (I might build some kind of helper form), or I could probably process some sort of markdown, where the [Aleocharinae] in "use [Aleocharinae] for ABC Collection" gets turned into https://arctos.database.museum/name/Aleocharinae, or something of the sort, if that's useful-enough?

sharpphyl commented 4 years ago

@Jegelewicz Can we add this question to the next Taxonomy Committee meeting?

WoRMS (via Arctos) automatically adds the preferred name if a taxon name is invalid. I'm not sure how many people are manually adding the preferred name to other classifications and whether it's worth it to have Dusty build the HTML he describes above. The preferred name is helpful when there are lots of synonyms but if it's not being completed very often, it's probably not worth the time with everything else that's going on.

dustymc commented 2 years ago

The solution is to file a Taxon Name Quarantine Request Issue, but https://github.com/ArctosDB/arctos/issues/4204 may change the format and/or functionality of that issue.