ArctosDB / arctos

Arctos is a museum collections management system
https://arctos.database.museum
60 stars 13 forks source link

worms refresh: test request #1841

Closed dustymc closed 5 years ago

dustymc commented 5 years ago

The "WoRMS (via Arctos)" classification is being built with data from their download. The download contains only basic classification information, so we'll need a webservice pull to update it. I'll automate that at some point, but (at least for now) I've added a manual refresh link.

After the initial seed from the download, http://arctos.database.museum/name/Phoca#WoRMSviaArctos looked like...

screen shot 2018-12-07 at 8 36 00 am

The "aphiaid" is displayed as a link which opens WoRMS (in a new tab). The "refresh" link next to it should do the grindy-gears thing and come back with another link.

screen shot 2018-12-07 at 8 36 57 am

Click, and...

screen shot 2018-12-07 at 8 47 26 am

The page should refresh with new WoRMS data.

This is now a "local" classification, so anything not in http://arctos.database.museum/info/ctDocumentation.cfm?table=CTTAXON_TERM will be ignored, and anything not in the tables which control values for terms (eg, http://arctos.database.museum/info/ctDocumentation.cfm?table=CTTAXON_STATUS for "taxon_status") will be ignored.

WoRMS has their own lingo, so I am attempting to "translate."

The WoRMS webservice doesn't contain nomenclatural_code so I'm guessing at that as well. They responded that I can derive it from Kingdom so I'm going with that, even though I don't really trust them....

Note that display_name is lost with refresh - it will rebuild itself (which might take a while right now - this update has things pretty busy).

The data come from two webservice calls:

http://www.marinespecies.org/rest/AphiaRecordByAphiaID/137010

and

http://www.marinespecies.org/rest/AphiaClassificationByAphiaID/137010

A list of their webservices is http://www.marinespecies.org/rest/

The number after the last slash is AphiaID, which is pulled from the term "aphiaid."

I can't do anything with relationships until we settle on terminology.

Is there anything else in WoRMS which we should be pulling to the local classification?

Is everything that we are pulling being mapped and translated properly?

About 100K names have been seeded with WoRMS data so far. https://arctos.database.museum/demo will pull a random thousand.

sharpphyl commented 5 years ago

I just tried a few and after the refresh and when I compare them to WoRMS (online), they look fine. In fact this cleans up the problem we were having with Subclass, Infraclass, Subterclass, etc. not coming through correctly. Here's what WoRMS (via Arctos) looks like:

screen shot 2018-12-07 at 12 09 18 pm

And here's what WoRMs itself looks like. Note that two classifications say "Subclass" which is only correct for one.

screen shot 2018-12-07 at 12 09 25 pm

However, after I click "refresh" and then "click to reload" I get this error message, but after I close the error message, the complete taxon has downloaded correctly.

screen shot 2018-12-07 at 12 03 50 pm

We even get "[unassigned] Caenogastropoda (order)" which we could never use before, so this is much more complete. I supposed we can't manage that in the Hierarchical tool, but we should never need to use that tool for these marine species again.

sharpphyl commented 5 years ago

Well, since you asked if anything else should be pulled into the local classification, here goes.

Can you map into "remarks" if something is "unaccepted" (our "invalid") the "accepted name." For example, on Cypraea errones (AphiaID: 216765), the status would be "unaccepted" (or "invalid" - whichever we use) and the remark would be "Accepted Name: Erronea errones (Linnaeus, 1758)"

That will make it easier for us to create relationships. I guess, if we get the "relationship table" squared away, you'll be able to add relationships, but just in case that never happens, this would be an alternative.

dustymc commented 5 years ago

error message

Try a shift-reload (I'm developing in production - the code is probably cached) and let me know details about your environment if that doesn't work.

I'm ignoring term_type which aren't in the code table - I'm not getting their "domain" (or whatever they call it) "Biota" (which conveniently happens to also be a genus of plant-er-sumthin'...). I'm not creating names for identification-like "taxonomy" and WoRMS has a ton of that. Otherwise, I just take what they feed me, and there's a LOT of random-string stuff embedded in that (remarks, ID-stuff, the apparent lack of an ability to not say anything when there's nothing to say, etc.)

They really like stacking terms without changing rank as well.

You can, and have always been able to, type anything you want in most any term in the single-record editor.

Yes, any of that will break the hierarchical editor, and this is precisely the reason that the core Arctos taxonomy model is not hierarchical (and also why I'm not overly keen on labeling it as such).

I suppose I can cram relationship-stuff into remarks if we can't resolve that in the next few days. Maybe we can just fix that in next week's AWG meeting??

sharpphyl commented 5 years ago

I still get the error but everything ultimately uploads perfectly and I'm sure the error message will eventually resolve itself.

Yes, AWG's meeting is fine for further discussion on the relationship terms as far as I'm concerned.

Jegelewicz commented 5 years ago

I suppose I can cram relationship-stuff into remarks if we can't resolve that in the next few days. Maybe we can just fix that in next week's AWG meeting??

Is it possible that the relationships could reside in the Non-Classification Terms table? Right now, they live outside the classification as part of the "name" data, but could they be part of the classification instead?

sharpphyl commented 5 years ago

Dusty, this looks absolutely fantastic. The whole classification appeared in all the taxa I checked.

I still got the error message on "refresh" but it doesn't look like I need to refresh anything. It's all there. I also see that I could choose this as my preferred source on our profile. Is this ready for action or are you still transferring non-WoRMS into this source? I don't want to jump the gun but you can see we're ready to roll!

dustymc commented 5 years ago

@sharpphyl I'm still picking up some stragglers - that should easily be done today. (Famous last words....)

Then I'll need to track down things that you use and WoRMS doesn't and move them over from the Arctos classification. That SHOULD be fast, but we'll see.

I'd prefer to wait to switch your collection until that's done just to avoid cache clashes. Shall I do that when I can?

The whole classification should only be there if it's all "major" ranks - the download just has family, order, etc. columns. The refresh should get everything (15 stacked terms all called "family" for some reason, superinfrasubgenus-type "intermediate" ranks, etc.), and I'll start automating that as soon as we get some remaining kinks worked out - hopefully this week. (Vague plan: wait until the AWG meeting so I can potentially deal with relationships, but I can also just set the refresh to run again, probably, unless WoRMS kicks me out or I melt the internet or something.... You call it.)

@Jegelewicz relationships are copied to the "Arctos Relationships" classification by triggers for search reasons.

I suspect you're suggesting we do something like GlobalNames and add some sort of "alternative name" non-"hierarchical" term(s) in which we'd eg store "Bitis" for the "Echidna" classification with family=Viperidae??

Relationships are truly between classifications - Echidna-the-mammal has nothing to do with rhino vipers - but doing that with keys would mean better protecting classification_id. That would involve a fair bit of mostly-internal shuffling, and giving up the ability to eg clone a classification to a local source and delete the original without losing relationship/common name data.

Dealing with taxon concepts would mean better protecting classification_id as well, so we may end up there even if not for relationships and/or common names.

All classifications (taxon concepts, mammals-vs.-snakes, etc.) would need to carry all relationship (and maybe common name) data. We don't do a good job of maintaining those data once per name, I see absolutely no hope of doing so for each classification.

I see these data primarily as helping users get to specimens, and as such I have no problem storing them in a not-so-correct manner - someone looking for puff adders will find eels every now and then (false positives), which is sort of always less-evil than the alternative of not finding some puff adders because of our weird and unpredictable administrative decisions (false negatives). People trying to assert a common name for only one of the 50 taxon concepts ("classifications") of Bitis arietans certainly have a different outlook on that.

In short, I think this may be a place where "correct" and "usable" clash. It certainly deserves its own Issue.

sharpphyl commented 5 years ago

I'd prefer to wait to switch your collection until that's done just to avoid cache clashes. Shall I do that when I can?

We periodically download a flat file of everything in Arctos and thought it might be wise to do that just before we switch to WoRMS. (I'm thinking I'll need it where genera have moved to a new family and I don't remember where they used to be - although I could probably check the Arctos source to find out.) We were planning to do the next download tomorrow if you're ready for the switch by then. So I'd like to confirm that we've down the download before you hit the switch if that makes sense.

The whole classification should only be there if it's all "major" ranks - the download just has family, order, etc. columns. The refresh should get everything (15 stacked terms all called "family" for some reason, superinfrasubgenus-type "intermediate" ranks, etc.), and I'll start automating that as soon as we get some remaining kinks worked out - hopefully this week. (Vague plan: wait until the AWG meeting so I can potentially deal with relationships, but I can also just set the refresh to run again, probably, unless WoRMS kicks me out or I melt the internet or something.... You call it.)

We can get by this week with just the major ranks. Also, no problem waiting until the AWG deals with relationships. In fact, that could wait for a future refresh and may end up being handled in a new way.

I suspect you're suggesting we do something like GlobalNames and add some sort of "alternative name" non-"hierarchical" term(s) in which we'd eg store "Bitis" for the "Echidna" classification with family=Viperidae??

I looked at GlobalNames for Echidna to see what this looked like, but didn't exactly find that, so I'm not sure. I just figured if we are never able to create precise relationships from webservices, then it would be helpful to just have text that tells us what the related taxon is to make it easier to add that to the record.

In short, I think this may be a place where "correct" and "usable" clash. It certainly deserves its own Issue.

I'll check to see what we already have in the queue and if nothing's a match for this, I'll start a new issue as it was just an aside here and should stand on its own.

dustymc commented 5 years ago

before you hit the switch

I'll coordinate with you.

Echidna

See http://arctos.database.museum/name/Echidna, most of which comes from GN.

https://github.com/ArctosDB/arctos/issues/735 is related.

The relationship points to a mammal (because the data originally came from a mammal collection - arbitrary administrative leftovers, same as usual). The 'Arctos' classification is an eel. The name has been used for a bunch of stuff. GlobalNames pulls the "related" classifications in - so you can find things with IDs linked to http://arctos.database.museum/name/Echidna by searching "Bitis" but you can't find things identified to http://arctos.database.museum/name/Bitis by searching "Echidna" - their relationships are one-way, so if you're coming from some out-of-date web page you're just not going to end up where you want to be. (It'd have to be REALLY out of date for this example, but things change commonly. It took Mammal Species of the World - a very commonly-used "authority" - 5 or 6 years to catch up on woodrats, where Arctos got it instantly because person publishing on woodrat taxonomy was working in Arctos, for example.)

screen shot 2018-12-10 at 12 15 51 pm

In any case, the current model has ONE Echidna-->Tachyglossus relationship. That makes it easy to maintain, and it serves to find specimens going both ways, but it's not where those data truly belong. If we move to a taxon concept model, we might end up with 20 concepts under http://arctos.database.museum/name/Echidna#Arctos - one for each revision of a much-revised field guide perhaps (each of which might include or exclude stuff included or excluded in other concepts). Each of those holding common name and relationship data is "correct" but it's also about 20 times more data to maintain just to keep our current capabilities - if you want to add a common name you need to update 20 classifications instead of one "taxon."

sharpphyl commented 5 years ago

See http://arctos.database.museum/name/Echidna, most of which comes from GN.

screen shot 2018-12-10 at 12 15 51 pm

Very helpful, thanks.

If we move to a taxon concept model, we might end up with 20 concepts under http://arctos.database.museum/name/Echidna#Arctos - one for each revision of a much-revised field guide perhaps (each of which might include or exclude stuff included or excluded in other concepts). Each of those holding common name and relationship data is "correct" but it's also about 20 times more data to maintain just to keep our current capabilities - if you want to add a common name you need to update 20 classifications instead of one "taxon."

Not sure what a "taxon concept model" is but it sounds like we need it. Also see #735.

sharpphyl commented 5 years ago

Dusty, would it be possible to have these two Non-Classification Terms be automatically populated for all the WoRMS (via Arctos) taxon. The first is SOURCE_AUTHORITY as "WoRMS (World Register of Marine Species) via Arctos". The second is TAXON_STATUS which would populate as "valid" when worms says it is "accepted" and invalid when worms says it is "unaccepted". We are seeing these are not populated when we go to refresh via the AphiaID and it would be extremely helpful to have those populate instead of us manually entering them.

dustymc commented 5 years ago

TAXON_STATUS

Derp - it should be back.

SOURCE_AUTHORITY

"WoRMS (World Register of Marine Species) via Arctos" should be pretty unavoidable just by it being there. How about "citation" eg

MolluscaBase (2018). Conus Linnaeus, 1758. Accessed through: World Register of Marine Species at: http://www.marinespecies.org/aphia.php?p=taxdetails&id=137813 on 2018-12-12

from http://www.marinespecies.org/rest/AphiaRecordByAphiaID/137813 ?

I don't have much of an opinion on this, I'm happy to do whatever, but that seems a bit more useful and might make the WoRMS folks happy too.

Jegelewicz commented 5 years ago
SOURCE_AUTHORITY

"WoRMS (World Register of Marine Species) via Arctos" should be pretty unavoidable just by it being there. How about "citation" eg

MolluscaBase (2018). Conus Linnaeus, 1758. Accessed through: World Register of Marine Species at: http://www.marinespecies.org/aphia.php?p=taxdetails&id=137813 on 2018-12-12

from http://www.marinespecies.org/rest/AphiaRecordByAphiaID/137813 ?

I don't have much of an opinion on this, I'm happy to do whatever, but that seems a bit more useful and might make the WoRMS folks happy too.

I like putting the citation in source authority as it gives credit where it's due (not all WoRMS classifications reside in WoRMS as the example you pulled demonstrates - it came from MolluscaBase.

BUT - this does mean that the TAXONOMIC_AUTHORITY code table is not doing anything....and maybe it shouldn't. #1814

dustymc commented 5 years ago

I got it from WoRMS, but I guess they're passing the credit on - so maybe we should too?

I have no strong opinions on taxonomic_authority - there's maybe some value in not spelling ITIS 700 ways, but a publication or URL or their little blurb does seem infinitely more useful than anything we could control.

Jegelewicz commented 5 years ago

a publication or URL or their little blurb does seem infinitely more useful than anything we could control

Absolutely.

sharpphyl commented 5 years ago

MolluscaBase (2018). Conus Linnaeus, 1758. Accessed through: World Register of Marine Species at: http://www.marinespecies.org/aphia.php?p=taxdetails&id=137813 on 2018-12-12

Definitely think this is excellent. Good to give WoRMS the credit due and a link for users back to the taxon. Thanks for suggesting it.

sharpphyl commented 5 years ago

Dusty, I just downloaded the entire database so it you're ready to switch us to the WoRMS (via Arctos) source, we're ready. We have found that all the taxon name you entered are great because we don't have to add a taxon in Arctos - BUT they have no classification as we're still in Arctos and not in WoRMS (via Arctos) so we don't get the family when we print labels and can't search for them, so I think we'd be better off with the WoRMS (via Arctos) source even if you're still refining it, but that's your call. I know I can make the change in our profile but I think it would be better if you did it when you think it's ready. All the relationships issues can be settled after we make the switch, I think.

Jegelewicz commented 5 years ago

I would like to chime in here with:

Why do we have one really good set of data in WoRMS via Arctos and one crappy one in Arctos? Who would protest if all the WoRMS stuff was just in Arctos? At some point can we just have good stuff in Arctos?

That's my whining for today and I don't expect it to happen immediately, but I do hope it happens eventually.....

sharpphyl commented 5 years ago

Why do we have one really good set of data in WoRMS via Arctos and one crappy one in Arctos? Who would protest if all the WoRMS stuff was just in Arctos? At some point can we just have good stuff in Arctos?

I think a lot of the collections managers would protest if suddenly all their records were updated per WoRMS. Instead of finding their Charonia in Ranellidae, they would be in Charoniidae (a new family). Lischkeia would be in Eucyclidae and not in Calliotropidae. I could list many, many similar changes that would happen with NO warning and they would not trigger a change in identification. Instead the classification for the identification would change and the history would be lost.

Since all we do is marine invertebrates, we stay on top of most changes, but if you're managing a multitude of collections, you probably want to stick with your legacy identifications and classifications until you consciously update them. If they're ready for the changes, then they could perhaps separate out their marine specimens and choose WoRMS as their preferred Source.

But I agree that we need to be clear that Arctos and Arctos Plants are static taxonomic tables and only updated (and cleaned up) by collection managers manually doing it. WoRMS is our first taxonomic source with auto updates but I hope it's not our last.

dustymc commented 5 years ago

Yea WoRMS data isn't that good (at least I've never seen a description of a species with two families, one on top of the other...) and some data in "Arctos" is very good (birds is probably best at the moment).

I can copy over bits-and-pieces on request. I think a global replace would be a major loss of data. Even if we assume WoRMS is perfect and we'd lose nothing, it would still introduce inconsistency and inconsistency is much more effective at hiding specimens than a missing subfamily or something.

There's some documentation on the nature of taxon source in http://arctos.database.museum/info/ctDocumentation.cfm?table=CTTAXONOMY_SOURCE.

We have (most of) a pathway to using external data now. If there's some webservice that talks to something with a resolvable identifier, it shouldn't be much problem to plug in to them.

@sharpphyl I'll move ahead with https://github.com/ArctosDB/arctos/issues/1844 (no problem updating the missing bits later if you want to) and then switch your collection over.

dustymc commented 5 years ago

SOURCE_AUTHORITY

screen shot 2018-12-12 at 9 25 49 am
Jegelewicz commented 5 years ago

OK, but if we find good classifications from WoRMS that we would like to have in Arctos (with maintenance by WoRMS), can they be updated if we use the aphiaID?

dustymc commented 5 years ago

Arctos (with maintenance by WoRMS), can they be updated if we use the aphiaID

Just to make the link the other way, that's now https://github.com/ArctosDB/arctos/issues/1855.

The refresh is now creating relationships where it can.

http://arctos.database.museum/name/Cepola%20indica

screen shot 2018-12-17 at 9 14 47 am

and

http://arctos.database.museum/name/Acanthocepola%20indica

screen shot 2018-12-17 at 9 15 29 am

Those (like all relationships) spawn the creation of...

screen shot 2018-12-17 at 9 18 46 am

"Any Taxon" matches anything on the page, so now specimens identified with Cepola indica are findable by Acanthocepola indica and anything else from the related classification.

1) Does that look correct? 2) Is there anything else which isn't being properly brought in with a click on....

screen shot 2018-12-17 at 9 16 46 am

I hope to get the automation doing its thing today; quick feedback would be most appreciated.

sharpphyl commented 5 years ago

I'm not a fish person, but everything looks good to me. The "refresh" worked inconsistently but maybe because you're loading lots of stuff.

I tested several invalid Cypraea. The relationship doesn't show up until you refresh WoRMS (via Arctos), but that's probably something that you're automating as I recall.

Love that you're able to get the relationships from WoRMS. Huge step forward for taxonomy and for search.

Anything else you want us to try to confirm it's working?

I'm sure you knew this would happen. If, during data entry, a data entry volunteer inputs a taxon that is invalid (per the collection's source) can you bring up the synonym and ask if they want to use it instead? If that would take too long to load, then maybe it's not a great idea. Let's see if the committee thinks it's a helpful idea.

dustymc commented 5 years ago

relationship

Yep, that'll come in with the refresh

working

Let me know if you see any problems, but I'm about to let the scripts go. I'll start with the names you're using in IDs.

data entry

What loads is at the discretion of whoever's approving the load - maybe you're using an old name for type material or something, and it's not the job of Arctos to be involved in that. I might be able to provide tools to detect name preferences, if I had something to base them off of.

Your volunteers should be seeing something like....

screen shot 2018-12-17 at 1 23 52 pm

You can add some sort of "Hey DMNS people use this one" value to taxon_status (or whatever we're calling that now), and I can mess with the display in whatever ways ya'll want.

dustymc commented 5 years ago

Here are a few names which were just refreshed automagically and have relationships (not necessarily all from WoRMS).

SCIENTIFIC_NAME

Coenia curvicauda Ochthera mantis Chaetopsis fulvifrons Tabanus quinquevittatus Astarte vernicosa Anodontia alba Tresus nuttallii Lottia instabilis Lottia ochracea Volutopsius stefanssoni Eualus gaimardii gaimardii Basilissa costulata Anachis nigricans Columbella aureomexicana Pyrulofusus harpa Fusinus luteopictus Fusinus barbarensis Huxleyia munita Chama echinata Chama sordida Limaria hemphilli Lucina chrysostoma Lysmata californica Crangon nigricauda Crangon alaskensis Metacrangon munita Mesocrangon munitella Nutricola lordi Pyrulofusus melonis Simomactra falcata Chionista fluctifraga Hyale frequens Bembix Sanguinolaria bertini Dosinia concentrica Callista maculata Meta Malaconothrus mollisetosus Parataenia medusia Diphyllobothrium ursi Mitrella ligula Myrakeena angelica Paphies elongata Lottia insessa Tropidophora Puffinus baroli Columbella nitida Voluta mercatoria Chioneryx grus

sharpphyl commented 5 years ago

Great. Looks really good. I did need to refresh to see the relationships in the above posting, but everything that I checked looked good.

The only issue remaining is a training one (unless you can do the popup whenever the taxon is invalid). Our volunteers need to not enter the entire name so that they will see the various options and can choose the right one. And, of course, they don't know how many letters to enter and how many to leave out to get the options.

If, for example, they type in "Dosinia conso" (looking for whether or not Dosinia consobrina is valid, the field is automatically filled with the taxon name as there is nothing else that begins with "Dosinia conso." Since we ask that they always check WoRMS first, they should already know it's invalid, but I'm looking for a way to speed the data entry process and avoid always checking the WoRMS website now that it's a source in Arctos. If a popup for "invalid" doesn't work, could you always do a popup of the species they typed in to show if it's valid or not - even if they don't actually need to check which one to use? That would work as quickly as researching the WoRMS website.

We'll have a group of volunteers doing data entry tomorrow so we'll let you know if we find anything weird but it looks great to me.

dustymc commented 5 years ago

AHA!

So if you enter Dosinia consobrina (or enough of it that you get one result) the popup auto-selects and it just works (even if it wasn't supposed to).

Enter Dosinia and you get...

screen shot 2018-12-17 at 2 48 52 pm

... the stuff you need to get you where you belong.

That's all UI. Lots of things use the "if there's just one then just use it" behavior because people have asked for it. Getting rid of that makes things simpler for me - I can ditch an IF statement - and saves a (minuscule, but still...) amount of processing.

Issue please. I'll need something to point at when the "why do I get popus when there's only one thing!?" complaints start rolling in....

sharpphyl commented 5 years ago

I hadn't thought of that. I suspect most collections do not want a popup for every identification in every data entry record. I wouldn't if most of our taxa were valid.

Let me work with our volunteers and see if training can take care of this. If they already know that a species is valid, then the popup is unnecessary. If it's a big deal then I'll open a new issue to see what others think.

dustymc commented 5 years ago

Status update:

Every name in WoRMS that looks like taxonomy (949168 names) should now be in Arctos, and I think all of them have a WoRMS (via Arctos) classification at this point. The scripts to refresh (or create if necessary) classifications are still running. I've throttled them heavily in the hopes that they won't be disruptive, but it should still be done by January when I'll take this back up.

I now have access to an "everything" download from WoRMS at a TACC IP, so I'm able to create/refresh (including relationships) from "local" data.

The next step is to figure out how to refresh data which have changed in WoRMS. I'll probably just use their webservices for that, but the download should be available if needed.

sharpphyl commented 5 years ago

That all sounds excellent. We won't be databasing again until January, so the timing works well. Yes, finding changes and new taxa will be important to keeping the data current. Hope that isn't a big hassle. Everything worked well this week. We didn't find any errors or significant omissions and, of huge value to us, we were able to database without adding a single new taxon.

sharpphyl commented 5 years ago

I'll add a question here unless you'd rather move it to a new issue. The WoRMS (via Arctos) source is working very well for us. We no longer have to enter new taxa each day and the volunteers are learning how to use the taxonomy module to valid names.

Today I created a hemihomonym and would like to confirm how the WoRMS refresh will handle it. We have several specimens of species of the genus Abbottella Henderson & Bartsch, 1920 (AphiaID 932847) which is a hemihomonym with Abbottella Hollenberg, 1967 (AphiaID 369307). Our upload from WoRMS selected the Plantae version so I added the Animalia name today.

When we refresh WoRMS (via Arctos) can both AphiaID's be updated or only one? If only one, can we direct the refresh to one AphiaID? The plant name is currently not used. I'm not actually concerned about updates to this particular taxon name, but in how the refresh process will work. If this has already been resolved in a GitHub issue just direct me over to it.

screen shot 2019-01-26 at 10 35 25 am

dustymc commented 5 years ago

No, I think this is a good place - assuming someone's going to write documentation.

I think the scripts are fine with that situation - they can be anyway - but you certainly shouldn't be.

Anything cataloged under that name by a collection preferring that source will be ambiguous, and you would need to jump through some sort of hoops to make it less so. Anyone using the specimen would need to notice your hoop-jumping, and so on. Arctos is a specimen database - there's no real reason we should try to capture any more taxonomy than we must, especially when we know where to get it if we need it. This is a voluntary excursion into the one place where our current taxonomy model falls apart, and I don't see any reason to go there.

https://github.com/ArctosDB/arctos/issues/1852 would of course "fix" this (by making you be explicit every time you pick a taxon), but that's a completely ridiculous approach to get at the precision of the data in WoRMS.

I would 100% delete the thing your collection doesn't intend to use.

sharpphyl commented 5 years ago

I'll go ahead and delete the taxon that we don't intend to use, but if a plant collection wants to use WoRMS (via Arctos) we'll probably have to come up with a different solution. Thanks.

dustymc commented 5 years ago

delete the taxon

Classification!

I can't imagine a reason for a plant collection to want to use that classification - we'd just fire up a new source that uses the same scripts for them. The only really impossible "same name, different critters" situation would be if some collection held material that they HAD to use the homonym for - someone holding the holotypes of both Echidna-the-eel and Echidna-the-viper or something. (And I don't think anyone's ever designated a holotype without also providing a subgeneric name, so even that would necessarily involve more precision than my favorite example holds.)

sharpphyl commented 5 years ago

Overall, WoRMS (via Arctos) is working phenomenally well for us. We've caught up with most of the changes in family names in our collection, though that will probably be an on-going activity. I haven't had to add a taxa or use the hierarchical editor since we moved to the WoRMS (via Arctos) source.

WoRMS is increasing the number of terrestrial species (via MolluscaBase) all the time. But I am having to add some species to WoRMS (via Arctos) that aren't yet in WoRMS, so there's no AphidID. If WoRMS adds these taxa, with an AphiaID, will we automatically add them to that taxonomic table with their new AphidID? And, if so, will there be two classifications for those taxa? I would expect that this will happen sometime in the coming year, so would like to understand what will happen.

dustymc commented 5 years ago

I can't guess at the ID for potentially-used records - choosing one is completely arbitrary from my viewpoint, the one you want may not even be in worms, etc. - that's just going to make a mess somewhere.

I only create aphiaid when I'm creating names, which can't have possible been used at that point. I can't break anything, and maybe I get lucky and do what you want sometimes!

Here are names with worms classifications and no aphiaid. I can try to somehow magic them in if you have clever ideas, or you can look them up and insert them.

create table temp_worms_noid as select distinct scientific_name from taxon_name, taxon_term where taxon_name.taxon_name_id=taxon_term.taxon_name_id and source='WoRMS (via Arctos)' and taxon_name.taxon_name_id not in (select taxon_name_id from taxon_term where source='WoRMS (via Arctos)' and term_type='aphiaid') order by scientific_name;

temp_worms_noid.csv.zip

sharpphyl commented 5 years ago

I think it would be sufficient to check this list every 6-12 months against WoRMS and then add the new AphiaIDs so they refresh automatically. I'll check the list and test it against this process later this week and see if anything new has been added.

sharpphyl commented 5 years ago

I'm reopening this because I took a look at the csv above and I'm confused. I know that most of these taxa without a WoRMS (via Arctos) aphiaID are ones that I added. Sometime last year, you sent me a list of missing taxa (mostly land snails) and I added them if there was a legitimate source authority somewhere. As WoRMS adds more terrestrial snails, we may have aphiaIDs.

But in some cases, WoRMS does have an alphiaID so I don't understand why the aphiaID didn't come with the WoRMS (via Arctos) classification. For example Xesta citrina in WoRMS has an aphia ID of 1259946.

Screen Shot 2019-03-17 at 9 31 27 AM

But the WoRMS (via ARctos) record doesn't list the aphia ID.

Screen Shot 2019-03-17 at 9 31 53 AM

I found the same thing with Zachrysia auricoma havannensis and Zachrysia. Same with Zachrysia auricoma but the problem appeared to be that there were two classifications so I deleted one and added the alphiaID.

Request: Is there anyway to get a list of all taxa in WoRMS (via Arctos) with two classifications?

Will the aphiaID get picked up in future updates? I'd rather wait and see if some of these issues resolve themselves with repeated updates. It took over an hour to look at just 11 of the >1700 on the list so checking everyone individually isn't the best use of time.

Request: Is there a way to know which taxa in WoRMS (via Arctos) are ones that I added? In that case, we can eliminate them from this list of no aphiaID taxa.

dustymc commented 5 years ago

https://arctos.database.museum/name/Xesta%20citrina has WoRMS data and relationships. I pull them from AphiaID. My guess is that someone has removed the ID from this record. I suppose there's some chance my scripts are eating it and I'll check them for that, but I don't think that's it.

Will the aphiaID get picked up in future updates?

Not for existing names - that comes with the (strong) possibility of mucking up your specimens.

Request: Is there anyway to get a list of all taxa in WoRMS (via Arctos) with two classifications?

create table temp_wrms_tid as select taxon_name_id, classification_id from taxon_term where source='WoRMS (via Arctos)' group by taxon_name_id,classification_id;

create table temp_wrms_mid as select taxon_name_id from temp_wrms_tid having count(*) > 1 group by taxon_name_id;

alter table temp_wrms_mid add sciname varchar2(255);

update temp_wrms_mid set sciname=(select scientific_name from taxon_name where taxon_name.taxon_name_id=temp_wrms_mid.taxon_name_id);

alter table temp_wrms_mid add aidlist varchar2(255);

declare 
  a varchar2(255);
  sp varchar2(255);
begin
  for r in (select * from temp_wrms_mid) loop
    a:=NULL;
    sp:='';
    for c in (select term from taxon_term where taxon_name_id=r.taxon_name_id and source='WoRMS (via Arctos)' and term_type='aphiaid') loop
        a:=a||sp||c.term;
        sp:=';';
    end loop;
    update temp_wrms_mid set aidlist=a where taxon_name_id=r.taxon_name_id;
  end loop;
end;
/

temp_wrms_mid(1).csv.zip

How did that happen?? Should we look for a way to prevent that, not that I've very confident there is one? We invested a huge amount of work into WoRMS in order to avoid ambiguous identifications, and now we're creating the data that can't possibly do anything except lead to ambiguous identification.

Request: Is there a way to know which taxa in WoRMS (via Arctos) are ones that I added? In that case, we can eliminate them from this list of no aphiaID taxa.

create table temp_worms_noid_p as select * from temp_worms_noid where scientific_name in (select scientific_name from taxon_name where CREATED_BY_AGENT_ID=21263650);

temp_worms_noid_p.csv.zip

sharpphyl commented 5 years ago

temp_wrms_mid(1).csv.zip

Thanks for the list of WoRMS (via Arctos) with two classifications. Over 1100 on this list and not sure how they all occurred. The first one I corrected (http://arctos.database.museum/name/Elimia%20floridensis#WoRMSviaArctos) is in WoRMS with an aphiaID but we somehow didn't get it and had two old ITIS entries. Could it be because it's had a very recent update on WoRMS (2019-03-18) although it looks like it was in WoRMs (or just MolluscaBase) since March 2018? If that's the problem, can your program find and add the alphiaID in the next update?

Will try to work through them all over the coming weeks. Won't solve all problems, but should reduce the number of issues described above.

temp_worms_noid_p.csv.zip

Per report, I've added 177 taxa to WoRMS (via Arctos) because our collection needed them and WoRMS doesn't (yet) have them. I tried to add a significant Source Authority to them all so there is, at least, a trail for others to follow as to why they were added. Example: http://arctos.database.museum/name/Abbottella%20adolfi#WoRMSviaArctos

dustymc commented 5 years ago

can your program find and add the alphiaID in the next update?

I find them easy enough, but I can't DO anything with them.

Say you're using Sometaxon for a clam. There's no aphiaid, so it's just "local." The name magically appears in WoRMS (as a whale). I grab the stuff from WoRMS, update Arctos, and now you're claiming to have cataloged a whale. It looks just like the data for which you chose an aphiaid. Or I make a new parallel classification using the new ID (which happens a LOT) and how we're not sure if you've cataloged a clam or a whale, and from the "happens a lot" thing pretty soon most of your collection is ambiguous.

There's some chance I can find something useful and safe to do with new data that shows up in WoRMS, but I don't know what that might be at this time. I think this really only works when a Curator (or their representative, of course) purposefully chooses a single aphiaID to link the name to identifications/specimens.

http://arctos.database.museum/name/Abbottella%20adolfi#WoRMSviaArctos

That would be more useful (and a bunch more work....) if those references were linked to the taxon as publications.

sharpphyl commented 5 years ago

So we're able to add WoRMS taxa that are NEW to WoRMS (via Arctos) but we can't overwrite existing taxa that were never linked to an aphiaID. I'll try to get these 1100+ fixed and then we can see what problems remain.

You're right. I've never linked a taxon to a publication. Do you have a gold standard taxonomic record to refer me to?

dustymc commented 5 years ago

So we're able to add WoRMS taxa that are NEW to WoRMS (via Arctos)

No, we're able to add WoRMS taxa that are new - things that can't possibly be used because I just created them.

gold standard taxonomic record

We should! And https://arctosdb.org/learn/gold-standard-records/ isn't happy for me @mkoo

http://arctos.database.museum/name/Oedipina%20kasios has a publication.