ArctosDB / arctos

Arctos is a museum collections management system
https://arctos.database.museum
60 stars 13 forks source link

What to do with unused, misspelled taxa? #2122

Closed acdoll closed 5 years ago

acdoll commented 5 years ago

I stumbled upon this: http://arctos.database.museum/name/Pelecanus%20erythrothynchus which is, as far as I can tell, simply a misspelling of Pelecanus erythrorhynchos (American White Pelican). This classification was created by 'unknown' last September and has the source_authority: UCMP. A quick google search on "Pelecanus erythrothynchus" yields the CalPhotos UCMP specimen card where the correct spelling is clearly written on the hand-written card (https://ucmpdb.berkeley.edu/cgi/ucmp_query2?&spec_id=V129739&one=T).

On the Edit non-classification page (http://arctos.database.museum/editTaxonomy.cfm?action=editnoclass&taxon_name_id=3165313) there is the option to relate to other taxa but the only sensible option for the relationship is 'potential alternate spelling'. Is this supposed to include clear misspellings (typographical errors) or just previously/currently accepted alternate spellings? Should we have an options like "likely misspelling of"? In this case there are no specimen tied to the taxon, so it would be great if it just went away, but I understand in other cases we would need to keep the known misspelled classifications in the list.

Issue Documentation is http://handbook.arctosdb.org/how_to/How-to-Use-Issues-in-Arctos.html

Table Code Tables are http://arctos.database.museum/info/ctDocumentation.cfm

Value Proposed new value

Definition Clear, complete, non-collection-type-specific definition of the new value.

Collection type If the code table includes a "Collection" column. Ex: Mamm, Herp, ES

Attribute data type free-text, categorical, number+units

Attribute value For categorical attributes, code table controlling value

Attribute units For number+units attributes, code table controlling units

Part tissue flag For new parts, is the part a tissue?

Other ID BaseURL For OtherIDs, URL with which to prepend value (resolvable identifiers only)

Context Describe why this new value is necessary and existing values are not.

Priority Please assign a priority-label.

campmlc commented 5 years ago

In the case where the mispelling is an internal Arctos data entry error that does not reflect the specimen tag, I would recommend deleting the name before it gets used somewhere. I think the relationships should be used for published taxon name mispellings.

On Fri, Jun 14, 2019 at 10:06 AM Andrew Doll notifications@github.com wrote:

I stumbled upon this: http://arctos.database.museum/name/Pelecanus%20erythrothynchus which is, as far as I can tell, simply a misspelling of Pelecanus erythrorhynchus (American White Pelican). This classification was created by 'unknown' last September and has the source_authority: UCMP. A quick google search on "Pelecanus erythrothynchus" yields the CalPhotos UCMP specimen card where the correct spelling is clearly written on the hand-written card ( https://ucmpdb.berkeley.edu/cgi/ucmp_query2?&spec_id=V129739&one=T).

On the Edit non-classification page ( http://arctos.database.museum/editTaxonomy.cfm?action=editnoclass&taxon_name_id=3165313) there is the option to relate to other taxa but the only sensible option for the relationship is 'potential alternate spelling'. Is this supposed to include clear misspellings (typographical errors) or just previously/currently accepted alternate spellings? Should we have an options like "likely misspelling of"? In this case there are no specimen tied to the taxon, so it would be great if it just went away, but I understand in other cases we would need to keep the known misspelled classifications in the list.

Issue Documentation is http://handbook.arctosdb.org/how_to/How-to-Use-Issues-in-Arctos.html

Table Code Tables are http://arctos.database.museum/info/ctDocumentation.cfm

Value Proposed new value

Definition Clear, complete, non-collection-type-specific definition of the new value.

Collection type If the code table includes a "Collection" column. Ex: Mamm, Herp, ES

Attribute data type free-text, categorical, number+units

Attribute value For categorical attributes, code table controlling value

Attribute units For number+units attributes, code table controlling units

Part tissue flag For new parts, is the part a tissue?

Other ID BaseURL For OtherIDs, URL with which to prepend value (resolvable identifiers only)

Context Describe why this new value is necessary and existing values are not.

Priority Please assign a priority-label.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/2122?email_source=notifications&email_token=ADQ7JBBCODJSMD5EPU2JZTLP2O6YFA5CNFSM4HYJGVKKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4GZS6OLA, or mute the thread https://github.com/notifications/unsubscribe-auth/ADQ7JBH4I2WAAX74GL5NAKTP2O6YFANCNFSM4HYJGVKA .

Jegelewicz commented 5 years ago

the only sensible option for the relationship is 'potential alternate spelling'. Is this supposed to include clear misspellings (typographical errors) or just previously/currently accepted alternate spellings?

And you would be correct to use that! Yes, this option is meant for all alternate spellings. There is really only ONE proper spelling for each name, so it doesn't really matter how the alternate spelling came to be here. There are published misspellings...

Should we have an options like "likely misspelling of"?

I don't think so. Normalization! Dusty would say that would be just another way of saying "potential alternate spelling".

In this case there are no specimen tied to the taxon, so it would be great if it just went away, but I understand in other cases we would need to keep the known misspelled classifications in the list.

That seems reasonable! Deleted.

Jegelewicz commented 5 years ago

@acdoll if you want to discuss the "potential alternate spelling" thing, please add this to the Taxonomy project, otherwise you can close the issue if you feel it's all resolved.

dustymc commented 5 years ago

created by 'unknown' last September

I think that's when we added the created-by structure.

potential alternate spelling

https://arctos.database.museum/info/ctDocumentation.cfm?table=CTTAXON_RELATION

Possible spelling variation detected by automation.

"Automation" is the key word there. I think the functional definition of "synonym" is "Possible spelling variation detected by people" (and maybe we should fix that). If you have more specific data - specific Code-terms backed by publications and asserted by experts or etc. - I'm happy to discuss more terms. If it's just "these probably refer to the same sorts of critters" then I think what we have is sufficient.

Is this supposed to include ...

http://handbook.arctosdb.org/documentation/taxonomy.html

It should include anything that might help a user find what they're looking for. Certainly anything that's been in a "taxonomic publication," but also stuff that's just plain wrong but in common usage.

really only ONE proper spelling for each name

Maybe, but Arctos isn't a taxonomic authority - and if we were we still wouldn't have the ability to correct the existing literature.

Normalization

YES! Multiple ways of saying the same thing is sorta always evil. I do think there's a distinction between machine-detected and people-asserted so I'd claim our two current values are different, even though they're functionally identical.

I think this one can probably be deleted, although https://www.jstor.org/stable/24723814?seq=1#metadata_info_tab_contents probably counts as "scientific literature."

acdoll commented 5 years ago

Thanks for the comments on this. My suggestion of "likely misspelling of" stemmed from the 'automation' term in the TAXON_RELATION code table. It seemed like that definition indicated we wanted to know the difference between when an algorithm picks a taxon name relationship and when an actual human has evaluated the relationship. I tend to think of the "synonym" term for names that have been used in publications that could be cited and not just typos from some other website. I did see that jstor publication also and would certainly count that as scientific literature, but if you look at the actual text you will see that it is just a bad print/scan of the paper version for which Google (or whatever indexed this pub) misinterpreted a smudged 'r' as a 't'.

acdoll commented 5 years ago

image

mkoo commented 5 years ago

Fixed