ArctosDB / arctos

Arctos is a museum collections management system
https://arctos.database.museum
60 stars 13 forks source link

names from worms #1826

Closed dustymc closed 5 years ago

dustymc commented 5 years ago

I uploaded the WoRMS dump to Arctos and pulled off names which won't fit under our rules. There are 211009 unique names which (based only on format) seem like they might be real names and which are not already in Arctos attached.

PLEASE have a quick scroll-through and let me know if you find anything that does not look like taxonomy (uncertainty markers, identification data, etc.).

Once that's done I'll load them and figure out how to get classification data pulled in. (I'll probably just set up a service - the data in the download are a bit limited.)

temp_worms_might_be_valid.csv.zip

DerekSikes commented 5 years ago

I found this in there:

Aërope

and these two names have single quotes on them:

'Gammarus' 'Gammarus' heteroclitus

there's quite a few varieties, like this:

Achnanthes biasolettiana var. genuina

and lots of 'forms' like this

Achnanthes biasolettiana f. minuta

lots of subspecies with 'subsp' indicated:

Acrosphaera spinosa subsp. flammabunda

some subspecies without 'subsp' indicated:

Platysympus typicus mediterraneus

I'm not sure but I suspect the forms and varieties are nomenclaturally equivalent (ICZN) to subspecies since they are not infrasubspecific (below subspecies) and the 3rd name of any trinomial is always(?) a subspecies.

this one has a '2'

Acipenserid herpesvirus 2

Other than the above it seems ok. I suppose we'll need to decide if we want to strip out all the 'f.' 'var.' and 'subsp.' and just make those into simple trinomials with the 3rd name in our rank= subspecies

-Derek

On Wed, Nov 28, 2018 at 1:10 PM dustymc notifications@github.com wrote:

I uploaded the WoRMS dump to Arctos and pulled off names which won't fit under our rules. There are 211009 unique names which (based only on format) seem like they might be real names and which are not already in Arctos attached.

PLEASE have a quick scroll-through and let me know if you find anything that does not look like taxonomy (uncertainty markers, identification data, etc.).

Once that's done I'll load them and figure out how to get classification data pulled in. (I'll probably just set up a service - the data in the download are a bit limited.)

temp_worms_might_be_valid.csv.zip https://github.com/ArctosDB/arctos/files/2626762/temp_worms_might_be_valid.csv.zip

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1826, or mute the thread https://github.com/notifications/unsubscribe-auth/AIraM5Ip21nPxKXJ0CG4c7h3CYhr4uAxks5uzwnfgaJpZM4Y4iLE .

--

+++++++++++++++++++++++++++++++++++ Derek S. Sikes, Curator of Insects Professor of Entomology University of Alaska Museum 1962 Yukon Drive Fairbanks, AK 99775-6960

dssikes@alaska.edu

phone: 907-474-6278 FAX: 907-474-5469

University of Alaska Museum - search 400,276 digitized arthropod records http://arctos.database.museum/uam_ento_all http://www.uaf.edu/museum/collections/ento/ +++++++++++++++++++++++++++++++++++

Interested in Alaskan Entomology? Join the Alaska Entomological Society and / or sign up for the email listserv "Alaska Entomological Network" at http://www.akentsoc.org/contact_us http://www.akentsoc.org/contact.php

Jegelewicz commented 5 years ago

I suppose we'll need to decide if we want to strip out all the 'f.' 'var.' and 'subsp.' and just make those into simple trinomials with the 3rd name in our rank= subspecies

Those are probably plants that should go into Arctos Plants....

Any way you can easily pick out plants?

DerekSikes commented 5 years ago

I doubt they are plants - doesn't worms only include animal names?

I've seen plenty of uses of f. and var. in my own beetles from the 1800s. I treated them as subspecies unless there was evidence they were infrasubspecific (like ab. for abberation)

-Derek

On Wed, Nov 28, 2018 at 1:57 PM Teresa Mayfield-Meyer < notifications@github.com> wrote:

I suppose we'll need to decide if we want to strip out all the 'f.' 'var.' and 'subsp.' and just make those into simple trinomials with the 3rd name in our rank= subspecies

Those are probably plants that should go into Arctos Plants....

Any way you can easily pick out plants?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1826#issuecomment-442637735, or mute the thread https://github.com/notifications/unsubscribe-auth/AIraM6DEdVEuEZj8N-PKh5ci-pmNwkdVks5uzxTxgaJpZM4Y4iLE .

--

+++++++++++++++++++++++++++++++++++ Derek S. Sikes, Curator of Insects Professor of Entomology University of Alaska Museum 1962 Yukon Drive Fairbanks, AK 99775-6960

dssikes@alaska.edu

phone: 907-474-6278 FAX: 907-474-5469

University of Alaska Museum - search 400,276 digitized arthropod records http://arctos.database.museum/uam_ento_all http://www.uaf.edu/museum/collections/ento/ +++++++++++++++++++++++++++++++++++

Interested in Alaskan Entomology? Join the Alaska Entomological Society and / or sign up for the email listserv "Alaska Entomological Network" at http://www.akentsoc.org/contact_us http://www.akentsoc.org/contact.php

Jegelewicz commented 5 years ago

I checked a couple of those with var. and they are algae.

Jegelewicz commented 5 years ago

Achnanthes brevipes var. brevipes

WoRMS is marine SPECIES, plants are included!

DerekSikes commented 5 years ago

well, it will be hard to filter those non-animals out without a higher classification associated with each name.

why not leave them in?

-D

On Wed, Nov 28, 2018 at 2:02 PM Teresa Mayfield-Meyer < notifications@github.com> wrote:

Achnanthes brevipes var. brevipes http://www.algaebase.org/search/species/detail/?species_id=36938

WoRMS is marine SPECIES, plants are included!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1826#issuecomment-442639121, or mute the thread https://github.com/notifications/unsubscribe-auth/AIraM7qY-jgSGpquvsPWfFW4cKcLUKrlks5uzxYjgaJpZM4Y4iLE .

--

+++++++++++++++++++++++++++++++++++ Derek S. Sikes, Curator of Insects Professor of Entomology University of Alaska Museum 1962 Yukon Drive Fairbanks, AK 99775-6960

dssikes@alaska.edu

phone: 907-474-6278 FAX: 907-474-5469

University of Alaska Museum - search 400,276 digitized arthropod records http://arctos.database.museum/uam_ento_all http://www.uaf.edu/museum/collections/ento/ +++++++++++++++++++++++++++++++++++

Interested in Alaskan Entomology? Join the Alaska Entomological Society and / or sign up for the email listserv "Alaska Entomological Network" at http://www.akentsoc.org/contact_us http://www.akentsoc.org/contact.php

Jegelewicz commented 5 years ago

It's OK by me, but the plant people might like to have them too.

dustymc commented 5 years ago

Aërope

Excel?

screen shot 2018-11-28 at 4 13 02 pm

'Gammarus'

Thanks - my character-checker was having a bad hair day, should be rejecting those now.

I checked a few of the names with infraspecific ranks, they seem to be ICBN (Chromista and such).

3rd name of any trinomial is always(?) a subspecies.

Na, plant-people (and I think everyone else that uses ICBN??) have a whole herd of infraspecific ranks.

Acipenserid herpesvirus 2

Virus taxonomy is weird but easy - they have a list, that's on it. https://talk.ictvonline.org/taxonomy/p/taxonomy-history?taxnode_id=20181404

Turns out there's a lot more virus (and bacteria?) taxonomy that's been rejected by my taxonomy "rules" - not sure if I should relax the rules or just reject them for now? I think I'm leaning towards reject, just because the list bypasses most of the problems associated with "normal" taxonomy - thoughts?

pick out plants

Maybe - but I can't tell which ones are "noncompliant," and I'm not sure how far ICBN extends - I know it's well beyond plants.

Shall I load (minus the 'quote things') and then see what I can do with nomenclatural code and the webservice?

sharpphyl commented 5 years ago

I know nothing about viruses so numbers are probably appropriate but a search for just "2" turned up a lot and there are more for other numbers.

Bürgeriella Luz24likevirus C2likevirus Vibrio phage fs2 P2likevirus Salmonid herpesvirus 2 Dyozetapapillomavirus 1 Bacillus phage phi29 C2-like viruses Phi29-like viruses P22-like viruses Phieco32-like viruses Asterolampra dallasiana f. 12-radiata

Then this Didymozoidae 'juveniles'

I think you, Derek and Teresa caught everything else that I see.

dustymc commented 5 years ago

Bürgeriella

Also Excel? It's Bürgeriella in everything I'm looking at.

Asterolampra dallasiana f. 12-radiata

That is Chromista, and it's why I'm reluctant to relax the rules to accommodate viruses until there's a need to do so - I can't tell the things they do from garbage (without plugging into The List, which minimally will take some work). I think best case it's a "noncompliant" name, more realistically it's somewhere between a "working name" and a cat wandering across someone's keyboard. The current check (after yesterday's tune-up) rejects it

UAM@ARCTOS> select scientificName ,isValidTaxonName(scientificName) from temp_worms_might_be_valid where scientificName like '%Asterolampra dallasiana f. 12-radiata%';

SCIENTIFICNAME
------------------------------------------------------------------------------------------------------------------------
ISVALIDTAXONNAME(SCIENTIFICNAME)
------------------------------------------------------------------------------------------------------------------------
Asterolampra dallasiana f. 12-radiata
Invalid characters.

and anything else with an integer.

Shall I proceed?

Jegelewicz commented 5 years ago

Turns out there's a lot more virus (and bacteria?) taxonomy that's been rejected by my taxonomy "rules" - not sure if I should relax the rules or just reject them for now? I think I'm leaning towards reject, just because the list bypasses most of the problems associated with "normal" taxonomy - thoughts?

Reject for now, we can deal with viruses when a virus collection thinks about coming in to Arctos. Plants are bad enough...

Yes, Excel is the problem on the diacritics, but it will end up being an ongoing problem unless we have an aka or just use the plain spelling. see #1827 or are these being rejected?

Otherwise, I say proceed.

dustymc commented 5 years ago

The problem is character set conversion, not particular characters. Anyone talking to Arctos in any "language" other than UTF must deal with converting to and from UTF to avoid this.

Valid taxon names are defined by https://github.com/ArctosDB/DDL/blob/master/functions/isValidTaxonName.sql. The hybrid/multiplication symbol is fine.

Jegelewicz commented 5 years ago

Valid taxon names are defined by https://github.com/ArctosDB/DDL/blob/master/functions/isValidTaxonName.sql. The hybrid/multiplication symbol is fine.

It LOOKS fine, but it isn't readily searchable. I don't know how to make the multiplication sign when typing in a search....

campmlc commented 5 years ago

me neither. Can we have a box to select for "hybrid yes/no"?

On Thu, Nov 29, 2018 at 11:44 AM Teresa Mayfield-Meyer < notifications@github.com> wrote:

Valid taxon names are defined by https://github.com/ArctosDB/DDL/blob/master/functions/isValidTaxonName.sql. The hybrid/multiplication symbol is fine.

It LOOKS fine, but it isn't readily searchable. I don't know how to make the multiplication sign when typing in a search....

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1826#issuecomment-442947027, or mute the thread https://github.com/notifications/unsubscribe-auth/AOH0hJpmyeNbRB7JBwhHKUEvnMZugR5Kks5u0CsfgaJpZM4Y4iLE .

dustymc commented 5 years ago

This is not the only place you might find characters which may not be on your keyboard.

× is not the only character in taxon names which might not be on your keyboard.

×Name is not the only way to indicate ICBN-compliant hybrids (and who knows what happens outside of that context).

I do not think there's anything which doesn't involve investing in linguistic indexes that might make this more approachable.

sharpphyl commented 5 years ago

I see no reason to not proceed.

On Thu, Nov 29, 2018 at 9:22 AM dustymc notifications@github.com wrote:

Bürgeriella

Also Excel? It's Bürgeriella in everything I'm looking at.

Asterolampra dallasiana f. 12-radiata

That is Chromista, and it's why I'm reluctant to relax the rules to accommodate viruses until there's a need to do so - I can't tell the things they do from garbage (without plugging into The List, which minimally will take some work). I think best case it's a "noncompliant" name, more realistically it's somewhere between a "working name" and a cat wandering across someone's keyboard. The current check (after yesterday's tune-up) rejects it

UAM@ARCTOS> select scientificName ,isValidTaxonName(scientificName) from temp_worms_might_be_valid where scientificName like '%Asterolampra dallasiana f. 12-radiata%';

SCIENTIFICNAME


ISVALIDTAXONNAME(SCIENTIFICNAME)


Asterolampra dallasiana f. 12-radiata

Invalid characters.

and anything else with an integer.

Shall I proceed?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1826#issuecomment-442896593, or mute the thread https://github.com/notifications/unsubscribe-auth/AOqAraJiIPp-r7ZgH4iN3EmBNsl-TFriks5u0AnRgaJpZM4Y4iLE .

sharpphyl commented 5 years ago

Once that's done I'll load them and figure out how to get classification data pulled in. (I'll probably just set up a service - the data in the download are a bit limited.)

@dustymc does that mean that WoRMS doesn't supply the complete classification? I noticed that on their webservice. If so, it would be good if you can set up a service to get the entire record. Thx. Any chance we can get taxon status and taxon relationships too?

dustymc commented 5 years ago

doesn't supply the complete classification

Not in the download; as far as I know it's all available from the webservice.

I can't do anything with relationships until https://github.com/ArctosDB/arctos/issues/1136 is resolved.

I'm getting some taxon_status values. I doubt I'm anticipating everything; translations would be very useful.

What is the plan forward from here? If this is to be a "local" classification (eg, something a collection intends to prefer) then I have a slightly different path than if it's more like the things we pull from GlobalNames. Do you intend to prefer/use this, or just use it to occasionally copy data over to "Arctos"?

sharpphyl commented 5 years ago

For our collection, it would be ideal to have all the WoRMS data just overwrite existing "Arctos" taxa and add new taxa within "Arctos." If WoRMS is a separate classification that I copy over to "Arctos" then it's not really much different that what I do today from the WoRMS website. What I need is for all the WoRMS taxa to be available to our volunteers during data entry without me having to copy from anywhere into "Arctos" multiple times each day.

But to add WoRMS taxa to "Arctos," we need everyone who uses those taxa to agree to it and be prepared for unanticipated taxonomic changes that could make it difficult for them to find their specimens. If one day all your Cymatium species disappear from Ranellidae and reappear in Cymatiidae and you aren't aware of it, you may not be happy with the WoRMs overwrite.

How do we secure that approval? If we do all WoRMS taxa (not just the Mollusca that are my primary interest), how many collections do we impact? I think we need the committee to weigh in on this and somehow secure consensus before we take this approach. We might want to start with one or two phyla (or a smaller group) to see the impact, although that could make a lot of work for you, Dusty.

The WoRMS taxa alone aren't sufficient for our entire collection today as they do not always include legacy taxa (often invalid but needed historically) and while WoRMS adds fresh water and terrestrial species every day, they don't have everything we need yet. In time, WoRMS might be sufficient for everything we do except legacy taxa.

If we keep the WoRMS taxa as a separate "local" classification, are we limited to just WoRMS or can we access (and add to) "Arctos" as well. And if we did select WoRMS as our preferred source, what would happen to existing identifications in "Arctos" but not in WoRMS? As long as we can dip into "Arctos" when needed during data entry or for updating identifications, we could keep WoRMS as separate and preferred "local" classification and let each collection decide whether or not to use the WoRMS data. That would probably be the least complicated approach as far as its impact on multiple collections.

As for the taxon relationships and taxon status values in #1136, I'll leave that to those who understand taxonomy much better. We can work with whatever is taxonomically best.

dustymc commented 5 years ago

overwrite existing "Arctos"

You'd need to coordinate that with everyone using the "Arctos" source, which doesn't seem terribly likely and is not really something I'd be anxious to support (or try to explain to new collections).

not really much different

I think the worst-case scenario here would be a click ("Clone classification") instead of opening two windows and copying back and forth between maybe hundreds of "fields" which don't necessarily share labels on each. Seems pretty different to me....

Here's my recommendation:

You'll always have the option of copying classifications across classification sources, the hierarchical editor provides a pathway to do that in bulk, and it's not hard to SQL (although it can be next to impossible to FIND stuff in SQL due to our cruddy data, much of which comes from https://github.com/ArctosDB/arctos/issues/1698). If you need a record and there are data in "Arctos" you can just copy it over, if you decide to start cataloging grasshoppers we can pull those data (by family or whatever) over too.

All names are available to anyone. There's some filtering capability on the data entry screen. If you use a name that doesn't have data in "your" classification then FLAT won't get populated, which will mess with the ranked specimen search fields ("family" and such), specimenresults and download, and probably what your labels talk to. It won't stop you from entering data, and adding data to "your" classification source will fix all of that stuff for all specimens. The data in other classifications (including those from GlobalNames) remain available for search by "any taxon."

The end result would be a hybrid classification where the WoRMS data comes from WoRMS (I probably can't stop you from changing local data, but it'll get periodically replaced) and anything that's not in WoRMS (or stuff without an AphiaID - you could choose to ignore WoRMS for specific records) works like any other "local" classification source.

campmlc commented 5 years ago

So if I understand correctly, this will be an alternative to Arctos and Arctos Plants, and a collection would have to prefer this to use it? So we would no longer be in the shared Arctos taxonomy if we switch to a local WORMS copy?

On Mon, Dec 3, 2018 at 9:55 AM dustymc notifications@github.com wrote:

overwrite existing "Arctos"

You'd need to coordinate that with everyone using the "Arctos" source, which doesn't seem terribly likely and is not really something I'd be anxious to support (or try to explain to new collections).

not really much different

I think the worst-case scenario here would be a click ("Clone classification") instead of opening two windows and copying back and forth between maybe hundreds of "fields" which don't necessarily share labels on each. Seems pretty different to me....

Here's my recommendation:

  • pull WoRMS into a "local" classification, which means getting the stuff we want as taxa terms and translating from WoRMS lingo to that. That's going to be somewhat dependent on our terminology; continuing to avoid things like relationships is going to limit what we can pull from WoRMS (or anyone else).
  • 1828 https://github.com/ArctosDB/arctos/issues/1828, so

    subsequent refreshes from WoRMS are by "concept" rather than namestring

  • set up some sort of auto-refresh schedule. I'll know more about what's possible after the initial pull.
  • copy anything from the "Arctos" classification that you want (stuff you're using, families from which you're using a name, whatever) into the new one. These would be static data until someone manually adds an AphiaID, at which point it would enter the refresh cycle. From another angle, when WoRMS gets new data you can just create a potentially-bare classification containing the ID and let Arctos worry about the rest. (And we can get their download and add the IDs that way or something - this doesn't have to be a person stumbling over individual records.)
  • switch your collection to prefer the new one
  • maybe remove some stuff from the Arctos classification, but that can be another discussion

You'll always have the option of copying classifications across classification sources, the hierarchical editor provides a pathway to do that in bulk, and it's not hard to SQL (although it can be next to impossible to FIND stuff in SQL due to our cruddy data, much of which comes from #1698 https://github.com/ArctosDB/arctos/issues/1698). If you need a record and there are data in "Arctos" you can just copy it over, if you decide to start cataloging grasshoppers we can pull those data (by family or whatever) over too.

All names are available to anyone. There's some filtering capability on the data entry screen. If you use a name that doesn't have data in "your" classification then FLAT won't get populated, which will mess with the ranked specimen search fields ("family" and such), specimenresults and download, and probably what your labels talk to. It won't stop you from entering data, and adding data to "your" classification source will fix all of that stuff for all specimens. The data in other classifications (including those from GlobalNames) remain available for search by "any taxon."

The end result would be a hybrid classification where the WoRMS data comes from WoRMS (I probably can't stop you from changing local data, but it'll get periodically replaced) and anything that's not in WoRMS (or stuff without an AphiaID - you could choose to ignore WoRMS for specific records) works like any other "local" classification source.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1826#issuecomment-443783287, or mute the thread https://github.com/notifications/unsubscribe-auth/AOH0hAlSr7G3irF-u_UTT3wO4G40wbrmks5u1VePgaJpZM4Y4iLE .

dustymc commented 5 years ago

alternative to Arctos and Arctos Plants

Perhaps. If we do make it "local" (add to http://arctos.database.museum/info/ctDocumentation.cfm?table=CTTAXONOMY_SOURCE) then yes. If we DO NOT make it "local" it'll be more like the stuff from GlobalNames - it's available for search and to copy to "local" classifications and such, but not directly linked to specimens.

collection would have to prefer this to use it

That's one option - it also gets the "clone" button pulling from translated WoRMS data, so I think this is useful even if nobody wants to directly use it.

local WORMS copy?

Yes, but that terminology makes me think "static" - this would be constantly updated (on a schedule to be determined - monthly, or 5 seconds after WoRMS publishes their change log, or whatever our combined resources can support).

sharpphyl commented 5 years ago

Then I agree that we should make WoRMs a stand-alone, new "local" classification. In the future, we'll add any new taxa we need that WoRMs doesn't contain and clone anything currently in Arctos over to WoRMS if needed. Unless I'm missing something, our collection will make WoRMS our preferred source.

sharpphyl commented 5 years ago

To transfer taxon classifications in Arctos but not in WoRMS, would I have to use the HC Editor and bulkload them or is there an easier way to get them into the new WoRMS classification. I realize they would be static and not routinely updated.

DerekSikes commented 5 years ago

Phyllis,

Have you used the clone classification into existing name tool? It's a link above all the classifications that are found when you search taxonomy for a name. The name needs to already be in Arctos but a couple of clicks with this tool will save lots of copy-paste time.

-Derek

On Tue, Dec 4, 2018 at 5:17 AM Phyllis Sharp notifications@github.com wrote:

To transfer taxon classifications in Arctos but not in WoRMS, would I have to use the HC Editor and bulkload them or is there an easier way to get them into the new WoRMS classification. I realize they would be static and not routinely updated.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1826#issuecomment-444114885, or mute the thread https://github.com/notifications/unsubscribe-auth/AIraMwZhQG_k8o7o6eiCzV4VNoutopL3ks5u1oQKgaJpZM4Y4iLE .

--

+++++++++++++++++++++++++++++++++++ Derek S. Sikes, Curator of Insects Professor of Entomology University of Alaska Museum 1962 Yukon Drive Fairbanks, AK 99775-6960

dssikes@alaska.edu

phone: 907-474-6278 FAX: 907-474-5469

University of Alaska Museum - search 400,276 digitized arthropod records http://arctos.database.museum/uam_ento_all http://www.uaf.edu/museum/collections/ento/ +++++++++++++++++++++++++++++++++++

Interested in Alaskan Entomology? Join the Alaska Entomological Society and / or sign up for the email listserv "Alaska Entomological Network" at http://www.akentsoc.org/contact_us http://www.akentsoc.org/contact.php

dustymc commented 5 years ago

To transfer taxon classifications in Arctos but not in WoRMS

That's what I was attempting to get at with

You'll always have the option of copying classifications across classification sources, the hierarchical editor provides a pathway to do that in bulk, and it's not hard to SQL (although it can be next to impossible to FIND stuff in SQL due to our cruddy data, much of which comes from #1698). If you need a record and there are data in "Arctos" you can just copy it over, if you decide to start cataloging grasshoppers we can pull those data (by family or whatever) over too.

You can do that in the hierarchical editor, you can do it a few ways for a single record, or I can help. My vague plan is to transfer anything you're using that doesn't end up with WoRMS data from Arctos before switching.

sharpphyl commented 5 years ago

First, yes, Derek, I have used the "clone classification into existing name" tool but always one-at-a-time and I'm thinking about moving larger blocks of taxa. Am I missing any other way to use that tool?

But if Dusty can " transfer anything we're using that doesn't end up with WoRMS data from Arctos before switching" into the WoRMS data, then we should be all set. I'll just need to learn some new techniques to copy classifications "across classification sources in bulk" in the hierarchical editor as an adjunct to what's transferred from Arctos.

sharpphyl commented 5 years ago

Just to clarify, with the approach you're taking, can you get taxon relationships from WoRMS and add them to the upload? We can always change their terminology to ours if we're uncomfortable using their taxon status and taxon relationship terms. It would be helpful to have their relationships built in as there are too many to manually add them. Thx.

dustymc commented 5 years ago

relationships

Can you give me an example from WoRMS and how you'd translate it to use http://arctos.database.museum/info/ctDocumentation.cfm?table=CTTAXON_RELATION?

I read http://www.marinespecies.org/rest/AphiaRecordByAphiaID/302594 as "Cepola indica is some sort of non-preferred variant of Acanthocepola indica." I suppose I could do "synonym of" (my "has something to do with each other" relationship) to and from each but we know more than that.

Translating is also going to add substantially to the maintenance - when something changes in the code table, we'll have to change the translations as well. I'd REALLY like to get our relationship terminology resolved before writing more code that uses it. https://github.com/ArctosDB/arctos/issues/1136

campmlc commented 5 years ago

In the example you give, the WoRMS entry seems pretty clear. "Cepola indica" with authority "Day, 1888" has status "unaccepted:, the valid name is "Acanthocepola indica" with valid authority "(Day, 1888)". Acanthocepola indica would have the relationship value "accepted synonym of" "Cepola indica" - which we need to create, based on these data. And then we need to add "unaccepted synonym of" to our code table to accommodate both reciprocal relationships. We need to do this anyway, regardless of WoRMS.

And yes, as part of all this we need clean up this table: https://arctos.database.museum/info/ctDocumentation.cfm?table=CTTAXON_RELATION&field=undefined. Here is a google spreadsheet I have started: https://docs.google.com/spreadsheets/d/1nf9tE57PNn9TCFb5EoBiSNSRc-NKcwc5FQkkMhbLV04/edit?usp=sharing

On Tue, Dec 4, 2018 at 4:48 PM dustymc notifications@github.com wrote:

relationships

Can you give me an example from WoRMS and how you'd translate it to use http://arctos.database.museum/info/ctDocumentation.cfm?table=CTTAXON_RELATION ?

I read http://www.marinespecies.org/rest/AphiaRecordByAphiaID/302594 as "Cepola indica is some sort of non-preferred variant of Acanthocepola indica." I suppose I could do "synonym of" (my "has something to do with each other" relationship) to and from each but we know more than that.

Translating is also going to add substantially to the maintenance - when something changes in the code table, we'll have to change the translations as well. I'd REALLY like to get our relationship terminology resolved before writing more code that uses it. #1136 https://github.com/ArctosDB/arctos/issues/1136

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1826#issuecomment-444303622, or mute the thread https://github.com/notifications/unsubscribe-auth/AOH0hIECwG8PCTUNfYJY7wzaH3u652Baks5u1wnFgaJpZM4Y4iLE .

dustymc commented 5 years ago

My understanding is that the word "synonym" has very specific meaning in the Codes, and I don't think there's enough information on the WoRMS page to make that determination.

I still like my spreadsheet, https://docs.google.com/spreadsheets/d/1S9tRAtgJQjCTTYKanxdTTSBouWj3CAZTrBdBh6HwUnE/edit#gid=894404859, linked from https://github.com/ArctosDB/arctos/issues/1136. It's the same idea as "[un]accepted synonym of" but avoids any Code implications.

Jegelewicz commented 5 years ago

I don't agree with creating our own version of relationships. We should use what the taxonomic community uses.

@campmlc I'm not sure I understand your Google sheet. What do the three columns mean?

campmlc commented 5 years ago

I agree with using terms in common usage rather than inventing our own. I don't like replacing "synonym of" with "variant of". I don't think we need to include all the different vague and/or obscure terms that are currently in the code table, however. An "accepted" or "unaccepted" synonym of would suffice for a lot of these.

In my google sheet, the first column contains the suggested new values for Taxon Relationship; the second column is the new definition; the third column the current set of relationship terms, and the fourth column is the current definition of those terms from https://arctos.database.museum/Admin/CodeTableEditor.cfm?action=editNoCollectionCode&tbl=CTTAXON_RELATION .

On Tue, Dec 4, 2018 at 8:36 PM Teresa Mayfield-Meyer < notifications@github.com> wrote:

I don't agree with creating our own version of relationships. We should use what the taxonomic community uses.

@campmlc https://github.com/campmlc I'm not sure I understand your Google sheet. What do the three columns mean?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1826#issuecomment-444346937, or mute the thread https://github.com/notifications/unsubscribe-auth/AOH0hPL2pzJ4T_AX8CDCLHGaZkklX1xEks5u1z8hgaJpZM4Y4iLE .

dustymc commented 5 years ago

I've been lead to believe that using Code terminology in ways that differ from the Code definitions is confusing; we should use it correctly, or we should avoid it altogether. I don't think we can use it properly, so I've suggested we avoid it.

There is some value in being able to auto-generate reciprocals; I think we should if that's possible.

Personally, I'd use three options:

1) "has something to do with" 2) "good version of" 3) "bad version of"

(1) would be self-reciprocating, (2) and (3) would be reciprocals of each other.

Adding more (eg, Code-compliant) terms to the list would not bother me, but I don't know that I'd use them much - I think they'd be more for the "experts." (And reciprocating might also require experts.)

I don't really care about the actual terminology, I'd just like the functionality - which I'd use immediately when eg, I pull from WoRMS.

@campmlc I don't understand where the data in your spreadsheet came from. Can a nomen nudum be in a relationship??

Here's current data in case it's useful.

UAM@ARCTOS> select TAXON_RELATIONSHIP || ' @ ' ||  count(*) from taxon_relations group by TAXON_RELATIONSHIP order by count(*);

TAXON_RELATIONSHIP||'@'||COUNT(*)
------------------------------------------------------------------------------------------------------------------------
superfluous renaming (illegitimate) @ 1
homonym (illegitimate) @ 2
parent of @ 2
pro parte @ 2
child of @ 5
misapplied @ 7
nomen oblitum @ 8
hybrid offspring of @ 9
nomen dubium @ 18
database artifact @ 35
unavailable, other @ 36
unavailable, suppressed by ruling @ 41
unjustified emendation @ 54
unnecessary replacement @ 64
homonym & junior synonym @ 66
unavailable, nomen nudum @ 176
senior synonym of @ 177
unavailable, incorrect orig. spelling @ 192
valid name for @ 396
orthographic variant (misspelling) @ 414
junior homonym @ 561
unavailable, database artifact @ 679
other, see comments @ 711
unavailable, literature misspelling @ 1695
subsequent name/combination @ 3827
accepted synonym of @ 4314
original name/combination @ 9903
synonym @ 21456
junior synonym of @ 29254
synonym of @ 217121

30 rows selected.
campmlc commented 5 years ago

I think we should care about the actual terminology, and it should be consistent with terminology in use outside of Arctos. We just don't need all the terms in all their variety - a reasonable subset would suffice. The primary designation should be that the term is accepted/unaccepted.

The table I pulled from is from the public view of Taxon Relation code table at https://arctos.database.museum/info/ctDocumentation.cfm?table=CTTAXON_RELATION

I had edited a couple things in the wrong place - it should be fixed now.

On Tue, Dec 4, 2018 at 10:08 PM dustymc notifications@github.com wrote:

I've been lead to believe that using Code terminology in ways that differ from the Code definitions is confusing; we should use it correctly, or we should avoid it altogether. I don't think we can use it properly, so I've suggested we avoid it.

There is some value in being able to auto-generate reciprocals; I think we should if that's possible.

Personally, I'd use three options:

  1. "has something to do with"
  2. "good version of"
  3. "bad version of"

(1) would be self-reciprocating, (2) and (3) would be reciprocals of each other.

Adding more (eg, Code-compliant) terms to the list would not bother me, but I don't know that I'd use them much - I think they'd be more for the "experts." (And reciprocating might also require experts.)

I don't really care about the actual terminology, I'd just like the functionality - which I'd use immediately when eg, I pull from WoRMS.

@campmlc https://github.com/campmlc I don't understand where the data in your spreadsheet came from. Can a nomen nudum be in a relationship??

Here's current data in case it's useful.

UAM@ARCTOS> select TAXON_RELATIONSHIP || ' @ ' || count() from taxon_relations group by TAXON_RELATIONSHIP order by count();

TAXON_RELATIONSHIP||'@'||COUNT(*)

superfluous renaming (illegitimate) @ 1 homonym (illegitimate) @ 2 parent of @ 2 pro parte @ 2 child of @ 5 misapplied @ 7 nomen oblitum @ 8 hybrid offspring of @ 9 nomen dubium @ 18 database artifact @ 35 unavailable, other @ 36 unavailable, suppressed by ruling @ 41 unjustified emendation @ 54 unnecessary replacement @ 64 homonym & junior synonym @ 66 unavailable, nomen nudum @ 176 senior synonym of @ 177 unavailable, incorrect orig. spelling @ 192 valid name for @ 396 orthographic variant (misspelling) @ 414 junior homonym @ 561 unavailable, database artifact @ 679 other, see comments @ 711 unavailable, literature misspelling @ 1695 subsequent name/combination @ 3827 accepted synonym of @ 4314 original name/combination @ 9903 synonym @ 21456 junior synonym of @ 29254 synonym of @ 217121

30 rows selected.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1826#issuecomment-444360755, or mute the thread https://github.com/notifications/unsubscribe-auth/AOH0hA-GWEL9Q_CuvEjRdFNua0uI-wFEks5u11TTgaJpZM4Y4iLE .

sharpphyl commented 5 years ago

Andy and I spoke with Frank Krell yesterday - http://iczn.org/content/dr-frank-t-krell. He is a commissioner with ICZN as well as our Curator of Entomology. He said that "Accepted synonym of" and "Senior synonym of" are the same. I too, thought that we should add "unaccepted synonym of" as the reverse, but he said it should be "Junior synonym of" which is already in our table. I think we should definitely continue to use terms in common use rather than creating our own terms. (Andy, please say if I've misstated Frank's comments.)

"child of " and "parent of" would seem to be classification terms and are used that way in our Hierarchical editor, so it's confusing to have them as relationship terms too. I would suggest eliminating them. Similarly, "nomen dubium" is a taxon status and not a taxon relationship term. Otherwise, everything I've used so far is already here. I hate to see this holding up our other projects.

As for WoRMS, first, here are the Status options they use and that we will get with the data to upload - see http://www.marinespecies.org/aphia.php?p=manual#topic16:

-Accepted: the used name is accepted in the present literature -Unaccepted: The used name is NOT accepted in the present literature -Nomen nudum: a name that does not comply with the name requirements of the codes, such as lack of a description or diagnoses or reference to a description or diagnosis or a type specimen is lacking for publications after 1999 -Alternate representation: to link species that are represented twice: once with and once without subgenus. Alternate representation can also be used for a species and its nominal subspecies (note: you can only add a subspecies if the species is present in the database). See example in the box below -Nomen dubium: a name of uncertain application, because it is not possible to establish the taxon to which it should be referred. A good example is the "Ascothoracida" genus Laocoon. There is a debate whether this is based on a parasite or on a detached piece of the host. It is clearly a dubious name -Temporary name: to create higher rank taxa to accommodate child taxa for which the classification is not sorted yet -Taxon inquirendum: an incompletely defined taxon that requires further characterization, it is impossible to identify the taxon -Interim unpublished: an as yet unavailable name (until in a print issue) which has been published online only, in a work that does not show evidence of ZooBank registration (ICZN Article 8.5)

We will need to decide if we want to use Teresa's suggestion to modify our "valid" and "invalid" taxon status terms to be "valid/accepted" and "invalid/unaccepted" to reflect the WoRMS terminology. (I would support her suggestion.)

I did not find a WoRMS list of relationships. They mostly just link the "unaccepted" taxon to the "accepted" taxon as synonyms without further definition.

As an aside, the WoRMS link shows instructions for their editors including screen clips of each step which is always helpful. I started to create similar "how to" steps for taxonomy, but with all the changes we're making, Teresa and I agreed that I should delay until we stabilize our process. My screen clips were falling out of date very quickly, so I'll start over when we feel we're closer to a "permanent" solution to some of these questions.

dustymc commented 5 years ago

@sharpphyl that applies to one Code; we manage multiple, they are anything but homogeneous, and some of those terms (but not necessarily their meanings) span Codes.

Parentage relationships are from ICBN.

We need to accommodate various Codes in some capacity. I don't think eliminating inconvenient data is ever going to be OK.

I will need a target and translation for any of those terms you want imported.

I'm not sure where to go from here. I had hoped to start the WoRMS import today, but I'm not thrilled with the idea of rewriting half that code when this stuff changes, and I don't think I can do anything with relationships until that's resolved.

Jegelewicz commented 5 years ago

I made some adjustments to Mariel's document however, we cannot use any of the homonym relationships as that would simply relate the name to itself. So those are out.

As for the rest, they boil down to the following:

synonyms - this includes junior, senior, accepted, unaccepted, original name combinations, and whatever other forms there may be of synonym. To simplify, perhaps we should go with the WoRMS model

I did not find a WoRMS list of relationships. They mostly just link the "unaccepted" taxon to the "accepted" taxon as synonyms without further definition.

We can simply say they are synonyms. This would make it easy for Dusty to create reciprocal relationships. (and import WoRMS relationships?)

caveat - we would then also need to be explicit about valid/accepted and invalid/unaccepted terms

misspellings - this includes orthographic variants, unavailable misspelling, database artifacts, and any other term that has been used to indicate human error in data entry. IMO these should all be deleted from Arctos taxonomy and changed to string format in their individual identifications (assuming everyone wants to keep the misspellings), then anyone who likes can add the correctly spelled taxon as an identification if they want their specimens to be found. (We are adding garbage to GBIF and CoL when we provide these as if they are OK).

status in the wrong place - ichnotaxon, nomen dubium, nomen nudnum, nomen oblitum, and nomen protectum really belong in TAXON_STATUS. We can have a discussion about that table elsewhere and whether or not all of these need to be recorded or if some of them are simply verions of valid/accepted, invalid/unaccepted.

parent/child - When I look at plants in The Plant List, it looks to me like these terms are used to indicate which family a genus belongs to. I don't know how or if they are meaningful to us. If someone can tell me they are necessary, I'm OK with them being a relationship.

hybrid offspring of - seems useful but I am not really certain it is something we NEED in Arctos. How does it facilitate search?

dustymc commented 5 years ago

synonyms

That is non-directional; we're losing data from eg, WoRMS. I don't care, my interest is in getting users to specimens, but I think some curators may - @DerekSikes @sharpphyl @ccicero ??

parent/child and hybrid offspring of are both related to stable hybrids. Again, I care to the extent those can help users find specimens (and I don't know if those do that or not).

I'm not seeing much of a chance for automated reciprocity in "misspellings," and one should NOT be a misspelling (unlike synonym, which can apply to all names in the cluster). Do we care?

simply verions of valid/accepted, invalid/unaccepted.

That makes sense to me. We have few actual use cases:

Eg if a specimen is using a name it can't hardly be a nomen nudum (I think, with basically no certainty...) - it's not clear to me what function that name could serve in a specimen-based system. That idea may or may not extend to other terms, the experts may or may not agree on "status," and that agreement may or may not be stable if it exists. I think we could simplify without giving up any functionality. (I also don't much care, I primarily just need some stability in the terminology so I can write code to it.)

misspellings...should all be deleted from Arctos taxonomy

Strongly disagree. Names get misspelled in the literature, those variants get distributed among the various taxonomy projects and used in more literature and etc. All but one of them is probably "wrong," but they're all about equally useful in getting users to specimens.

Those should probably not be used for identifiations (and things not used for IDs aren't shared via DWC), but that's another matter. (And I know of one case with a misspelling in the type description, so even that's not an absolute.)