ArctosDB / arctos

Arctos is a museum collections management system
https://arctos.database.museum
60 stars 13 forks source link

taxon_status code table cleanup #2926

Closed Jegelewicz closed 2 years ago

Jegelewicz commented 4 years ago

We have not clearly defined the purpose of https://arctos.database.museum/info/ctDocumentation.cfm?table=cttaxon_status

Before I move on to suggestions for deleting/modifying terms, I'd like to get an agreement on what this code table and related taxon classification metadata attribute are intended to accomplish.

I suggest this definition

Taxon status is available for use as an attribute of taxon classifications as metadata. The taxon status table holds the controlled vocabulary terms available for use in this attribute. These terms are used to indicate the fitness of a name for use in identifications.

sharpphyl commented 4 years ago

I looked at ITIS and WoRMS etc. for some ideas.

Taxon status is classification metadata for a taxon name. It is selected from a controlled vocabulary. Preferably, the status is the taxonomic judgment of published literature and indicates its fitness for use in specimen identifications.

dustymc commented 4 years ago

extant/extinct have never made any sense to me, it's confounding taxa and taxon concepts and detectability and probably some other stuff.

MAYBE "ichnotaxon" is useful, but we need to pick one of it or the issue that's requesting ichnospecies etc., and it has nothing to do with "fitness for use"

It's not clear to me what purpose nomen dubium , nomen nudum , nomen oblitum could have in a specimen database.

species for count - what?!?

So that leaves valid/invalid, which are only "[don't] use this" and could be replaced by preferred_name, which is the same information plus what's actually needed.

Can we just drop this?

Jegelewicz commented 3 years ago

See also #2591 and #2594

Jegelewicz commented 3 years ago

Can we just drop this?

Nope.

This table is meant to help those doing data entry select appropriate names or know when to ask questions. Also it may help collection managers determine when they are continuing to use outdated names. So the table definition suggested is:

The taxon status table holds the controlled vocabulary terms available for use as taxon status in classification metadata. The terms included should convey the taxonomic judgment of published literature and indicate the fitness for use of a name in specimen identifications. These terms are meant to help those doing data entry or review of collection taxonomy select appropriate names or know when to ask questions.

We want the terms in this table to match what taxonomists use and the best case we have for that right now is WoRMS. @sharpphyl would like WoRMS (via Arctos) to reflect the actual data in WoRMS, adding the terms as requested in #2591 will make that happen. Once that is done, we will move extant/extinct - see #2594 which will leave the following:

ichnotaxon - I am probably going to suggest a new taxonomy source for ichnotaxa and that all classifications in that source use the "ichno" term types which will automatically demonstrate that they are ichnotaxa, so we shouldn't need this term at that point anyway.

species for count - I don't know who requested this or who uses it, but we need to find some other way for them to do whatever they are doing with it. I say it needs to be removed from the table.

DELETE - I don't think we need this either. Any reason we do?

I believe this will leave us with the WoRMS terminology (except that we are weirdly using valid instead of accepted and invalid instead of unaccepted - can we change that?).

dustymc commented 3 years ago

this will leave us with the WoRMS

https://github.com/ArctosDB/arctos/issues/2591#issuecomment-730028593

Sounds like an argument for drop to me....

Jegelewicz commented 3 years ago

For reference, WoRMS status list

sharpphyl commented 3 years ago

I believe this will leave us with the WoRMS terminology (except that we are weirdly using valid instead of accepted and invalid instead of unaccepted - can we change that?).

Someone in our committee ages ago (Derek?) advised against using "accepted" and "unaccepted" because of its meaning for plants or some other group of names. Does anyone else remember exactly why we didn't go with the WoRMS terms?

Jegelewicz commented 3 years ago

As I have been working on the TPT taxonomy, I think we should be using DarwinCore as a guide and as such, I suggest changing the taxon_status table to include terms as suggested in DwC:

taxon_status = dwc:taxonomicStatus

The status of the use of the scientificName as a label for a taxon. Requires taxonomic opinion to define the scope of a taxon. Rules of priority then are used to define the taxonomic status of the nomenclature contained in that scope, combined with the experts opinion. It must be linked to a specific taxonomic reference that defines the concept.

Bold is by me and is a project for another day.

controlled vocabulary from https://www.iczn.org/the-code/the-international-code-of-zoological-nomenclature/the-code-online/ and https://www.iapt-taxon.org/nomen/pages/main/glossary.html (Thanks to the plant people for providing a linkable glossary...)

I suggest that some terms currently in taxon_status be moved to a new code table:

nomenclatural_status = dwc:nomenclaturalStatus

The status related to the original publication of the name and its conformance to the relevant rules of nomenclature. It is based essentially on an algorithm according to the business rules of the applicable code. It requires no taxonomic opinion.

controlled vocabulary from https://www.iczn.org/the-code/the-international-code-of-zoological-nomenclature/the-code-online/ and https://www.iapt-taxon.org/nomen/pages/main/glossary.html

other terms currently in taxon status - these don't fit in either of the DarwinCore terms, but I think they are useful for Arctos users? Where should we put them? All definitions come from https://www.iczn.org/the-code/the-international-code-of-zoological-nomenclature/the-code-online/ glossary

This is a taxon_term that we currently use, that might end up being more useful in the future and might benefit from the DarwinCore definition

preferred name = dwc:acceptedNameUsage

The full name, with authorship and date information if known, of the currently valid (zoological) or accepted (botanical) taxon.

Ideally this would link to the accepted name, but just putting in the accepted name (as we do now) is probably OK. We might consider adding the following:

preferred name ID = dwc:acceptedNameUsageID

An identifier for the name usage (documented meaning of the name according to a source) of the currently valid (zoological) or accepted (botanical) taxon.

This could provide the link to the classification of the accepted name? But maybe isn't necessary. Just throwing it out here...

Sorry for the long-winded comment. I am sure this is not perfect and maybe needs to be parsed into separate issues, but I wanted to put it down somewhere while it is in my mind.

sharpphyl commented 3 years ago

Taxon status table:

extant Of a taxon: having living representatives. extinct Of a taxon: having no living representatives. fossil-taxon. ichnotaxon A taxon based on the fossilized work of an organism, including fossilized trails, tracks or burrows (trace fossils) made by an animal.

Would it make any sense to add "trace fossil specimen" to the catalog item type since "fossil specimen" and "specimen" are already there? Would we need to have these three taxon types if the catalog item type conveys the same information? Would GBIF know what to do with it?

Screen Shot 2021-04-26 at 8 46 55 AM

If #3579 defines the genus/species as ichno taxa, that could align with "trace fossil" as the catalog item type and avoid need for another table.

campmlc commented 3 years ago

Sounds reasonable?

On Mon, Apr 26, 2021 at 9:03 AM Phyllis Sharp @.***> wrote:

  • [EXTERNAL]*

Taxon status table:

extant Of a taxon: having living representatives. extinct Of a taxon: having no living representatives. fossil-taxon. ichnotaxon A taxon based on the fossilized work of an organism, including fossilized trails, tracks or burrows (trace fossils) made by an animal.

Would it make any sense to add "trace fossil specimen" to the catalog item type since "fossil specimen" and "specimen" are already there? Would we need to have these three taxon types if the catalog item type conveys the same information? Would GBIF know what to do with it?

[image: Screen Shot 2021-04-26 at 8 46 55 AM] https://user-images.githubusercontent.com/15368365/116102801-3de48f80-a66c-11eb-9d7f-cc6e9350c22d.png

If #3579 https://github.com/ArctosDB/arctos/issues/3579 defines the genus/species as ichno taxa, that could align with "trace fossil" as the catalog item type and avoid need for another table.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/2926#issuecomment-826908188, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADQ7JBA3COQNRZNOX7A7Z3DTKV6FLANCNFSM4O3CPBVA .

Jegelewicz commented 3 years ago

Probably, but maybe needs it's own issue - this one is out of control.

dustymc commented 3 years ago

https://dwc.tdwg.org/terms/#dwc:basisOfRecord - adding values to https://arctos.database.museum/info/ctDocumentation.cfm?table=ctcataloged_item_type without changing the mapping and rebuilding the DWC cache will result in SOMETHING weird - probably GBIF rejecting the record (or dataset, IDK).

My only "objection" is extinct etc., but even that's not really an objection - I don't think we have the attention span or data resolution to pull that off in a useful fashion, but as long as that's documented and it's useful for some curatorial reason then whatever.

It requires no taxonomic opinion.

That's a bit optimistic, careers are built around those! https://github.com/ArctosDB/arctos/issues/3548 has been cooking since at least 1958.

Big-picture, this is definitely a better organization, I just wonder what benefit we would get out of the work required to create the data or deal with the complexity of having more authorities? How much of this conversation is a legacy of trying to make one(ish) giant blurb of taxonomy work for everyone? I think/hope we're moving towards things like https://github.com/ArctosDB/arctos/issues/3544, where these things are likely to be much less important (eg, just don't put the things you want to avoid in "your" classification, no need to add them and then flag them so you can avoid ever seeing them again). That leads to https://github.com/ArctosDB/arctos/issues/3525 - eg can we find a way to usefully integrate whatever comes in from things like WoRMS (however it comes in) with catalog records, or do we still need to "translate"?

Maybe we should only proceed when/if there's some compelling reason to do so, when having those data would DO SOMETHING for us.

From the technical side of things, controlling "local" taxon terms while also accepting whatever comes from GlobalNames needs a revisit - I think it's hard-coded, if we're going to fire up a bunch of new CTs then we need some more complexity (maybe something like we do for various Attributes) to hook them in.

Jegelewicz commented 2 years ago

WoRMS Terms

Possible statuses for a name:

Accepted: the used name is accepted in the present literature

Unaccepted: The used name is NOT accepted in the present literature

Nomen nudum: a name that does not comply with the name requirements of the codes, such as lack of a description or diagnoses or reference to a description or diagnosis or a type specimen is lacking for publications after 1999

Alternate representation: to link species that are represented twice: once with and once without subgenus. Alternate representation can also be used for a species and its nominal subspecies (note: you can only add a subspecies if the species is present in the database). See example in the box below

Nomen dubium: a name of uncertain application, because it is not possible to establish the taxon to which it should be referred. A good example is the "Ascothoracida" genus Laocoon. There is a debate whether this is based on a parasite or on a detached piece of the host. It is clearly a dubious name

Temporary name: to create higher rank taxa to accommodate child taxa for which the classification is not sorted yet

Taxon inquirendum: an incompletely defined taxon that requires further characterization, it is impossible to identify the taxon

Interim unpublished: an as yet unavailable name (until in a print issue) which has been published online only, in a work that does not show evidence of ZooBank registration (ICZN Article 8.5)

Jegelewicz commented 2 years ago

Discussion in code table committee - do we really need so much control? Could we lose all of this to some sort of free text taxon attributes that would let people use what they want to indicate validity or usefulnees?

For things like WoRMS (via Arctos) this means we could pull in and use whatever the heck they have instead of translating from their terms to ours.

Dusty also suggested removing all these terms except nomenclatural_code, display_name, scientific_name, and aphiaid because they are the only ones that DO anything:

Term Has Code Table?
aphiaid no
author_text no
chemical_formula no
dana_number_8 no
display_name no
heys_cim no
infraspecific_author no
managed_by no
nickel-strunz_10 no
nomenclatural_code yes
nomenclature_4.0_identifier no
preferred_name no
remark no
scientific_name no
source_authority no
taxon_status yes

I think they could be useful, but does anyone actually use them?

If we left this table as is, but just removed the constraints of the taxon_status code table for taxon_status would that make that term less useful?

Jegelewicz commented 2 years ago

Added to January AWG issues agenda.

dustymc commented 2 years ago

removing all these terms except

Clarification: I'm asking if we can relax control, not remove any existing terms or values.

Background: All "classification stuff" is in one table, I have to accept everything that comes from GlobalNames, so the checks are all by trigger - if the source is in https://arctos.database.museum/info/ctDocumentation.cfm?table=cttaxonomy_source then I only allow terms from https://arctos.database.museum/info/ctDocumentation.cfm?table=cttaxon_term, and the term type is in my controlled list (also in the trigger), I go check if the value is in the appropriate code table (eg https://arctos.database.museum/info/ctDocumentation.cfm?table=cttaxon_term#nomenclatural_code requires a term from https://arctos.database.museum/info/ctDocumentation.cfm?table=ctnomenclatural_code)

Only local sources are allowed in identifications (https://github.com/ArctosDB/arctos/issues/3311) , so I have to "translate" things that come from eg WoRMS (destined for https://arctos.database.museum/info/ctDocumentation.cfm?table=cttaxonomy_source#worms__via_arctos_). I'd rather not; that requires maintenance, information may be lost in the translation, and it's different in Arctos than on WoRMS which is probably confusing to users.

Option Zero: Change no functionality, maybe do stuff requested above.

Option One: Relax the value controls. Continue translating status to taxon_status but just accept accepted from WoRMS instead of translating it to valid. This will keep the types of things consistent, but provide less control over values.

Option Two: Relax term and value controls. Just take 'status=valid' from WoRMS and use it. This may result in less-predictable data from locally-managed sources, which would matter if searching on specific types was important, but I don't think we have the data to support that no matter how we structure it - some arbitrary smattering of taxa have things like extinct/extant, it's useful when the name has been found, but it's not very useful in finding because we don't have consistent data.

I think I like Option Two. That would allow someone to download some relevant checklist-or-whatever, make a new Source, upload, and use with their untranslated terminology with very little work, in addition to things like making the WoRMS pull smoother and more as expected. It would provide more possibility for locally-managed data to get weird terms and values, but that can be mitigated to some extent (eg by suggesting existing values) and I don't think that has much of a functional implication in these data anyway. Hopefully we're all moving towards things like importing checklists rather than managing locally anyway.

Jegelewicz commented 2 years ago

I think I like Option Two. That would allow someone to download some relevant checklist-or-whatever, make a new Source, upload, and use with their untranslated terminology with very little work, in addition to things like making the WoRMS pull smoother and more as expected. It would provide more possibility for locally-managed data to get weird terms and values, but that can be mitigated to some extent (eg by suggesting existing values) and I don't think that has much of a functional implication in these data anyway. Hopefully we're all moving towards things like importing checklists rather than managing locally anyway.

@dustymc thanks for the clarification and I agree with the above.

Jegelewicz commented 2 years ago

Go with option two!

mkoo commented 2 years ago

AWG loves Option 2!

Jegelewicz commented 2 years ago

@dustymc what do we need to do from our end?

dustymc commented 2 years ago

Next release will have relaxed constraints, and the edit classification UI has been much simplified to accommodate.

Adding other kinds of suggestions involves changing code, but should be pretty simple to do if necessary.

The above code tables should not be connected to anything at this point - used terms can be removed or changed without consequence (beyond the suggestions changing).

Testing greatly appreciated.

http://test.arctos.database.museum/name/Conus%20abbas%20abbas#ArctosCulture is my poor tortured test mule, if that's somehow useful.

Jegelewicz commented 2 years ago

I like this

image

BUT when creating a classification from scratch - http://test.arctos.database.museum/editTaxonomy.cfm?action=editClassification&classification_id=65CDE689-8FDC-466A-ACB4B0F3A29ED41A&taxon_name_id=12117355

Nomenclatural code did not offer up the code table options

Taxon Status did not offer up the code table options

Jegelewicz commented 2 years ago

Adding other kinds of suggestions involves changing code, but should be pretty simple to do if necessary.

Could we offer suggestions for term already in use in the field? So if I type "A" into kingdom, "Animalia" would be a choice? Asking too much?

Then to ask even more, offer choices limited by previous fields in the classification. So, I have Animalia in kingdom, then choices for Phylum would pick only phylum names associated with "Animalia" BUT still let me type whatever if I in fact am using a new Phylum?

Jegelewicz commented 2 years ago

Finally, how about limiting the offered terms in the form based upon the name type? This way when creating a classification for a Linnean name I wouldn't be automatically offered all of the mineral and cultural stuff (but I could add them if I want for the weird oar things)?

dustymc commented 2 years ago

did not offer up

Screen Shot 2022-01-27 at 7 23 43 AM Screen Shot 2022-01-27 at 7 24 17 AM

Show me, please.

"Animalia" would be a choice? Asking too much?

I don't think we have the resources for that.

offered terms in the form based upon the name type

That's probably possible, but would take some different structure of the authority data.

We should have a focused conversation (not here) before deciding to invest too much in this thing. I can't see much reason for anyone to use this form for any purpose - yet they do, and IDK if that's me not understanding what these data are used for, or users not understanding just how inconsistent this thing leaves data, or ?????