Closed dustymc closed 4 years ago
Let's talk about how GloBi does taxonomy. See Enhydra lutris
which links to all of the various taxonomic sources. This is done through a resolver. Zenodo
Could we free ourselves from managing taxonomy in Arctos by using a tool like this?
Alternate approach which might be mostly functionally identical but require less development, processors, and sorta everything else:
collection.preferred_taxonomy_source's datatype is currently FKEY-->classification_source, which is interpreted as "use classification data from SOURCE, else fail with no cached classification data."
Converting to ordered array (supported by PG) would be interpreted as "use SourceA if exists, else use SourceB if exists, else use SourceC if exists, else fail with no cached classification data."
So for example a collection could...
Un-wishlisting this; this approach should be comparatively trivial to implement and would have significant impacts.
DMNS:Inv could just use (and perhaps help improve) the Arctos classification for things not in WoRMS.
Animal-centric paleo collections could fall back to Arctos Plants for plant material, which would stop the continual reintroduction of plants to the "Arctos" classification. Problems caused by homonyms in the same classification - and there are many thousands of them - are what caused us to split classifications in the first place.
Suggest prioritization; the single classification per collection is actively introducing potentially-problematic data.
I support this, with high priority.
On Thu, Aug 13, 2020 at 11:06 AM dustymc notifications@github.com wrote:
- [EXTERNAL]*
Un-wishlisting this; this approach should be comparatively trivial to implement and would have significant impacts.
DMNS:Inv could just use (and perhaps help improve) the Arctos classification for things not in WoRMS.
Animal-centric paleo collections could fall back to Arctos Plants for plant material, which would stop the continual reintroduction of plants to the "Arctos" classification. Problems caused by homonyms in the same classification - and there are many thousands of them - are what caused us to split classifications in the first place.
Suggest prioritization; the single classification per collection is actively introducing potentially-problematic data.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/2231#issuecomment-673596882, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADQ7JBGOKDJOTWMCW5UEUNLSAQMRXANCNFSM4IOXE42Q .
DMNS:Inv could just use (and perhaps help improve) the Arctos classification for things not in WoRMS.
Totally agree this would be better than mucking up WoRMS (via Arctos) with names they don't have.
This is mostly functional and should be out tonight or possibly tomorrow. It will need documentation. I can demonstrate whatever you'd like to see in test, but https://github.com/ArctosDB/internal/issues/65 makes it difficult to see for yourself. There are two changes:
Manage collection looks like this:
which is interpreted as "if all taxa used in an identification have at least one Arctos classification then use that, if not check Arctos Plants, if not there then check Worms, if not there then we're at the end of the list so do nothing."
I hope this will lead to more smaller and cleaner classifications. CollectionA has a shrew taxonomist, so they start a "Soricidae according to us" classification and do cool things with a manageable number of taxa, CollectionB has a bat taxonomist so they do the same, all mammal-having collections then use
If CollectionC doesn't like what CollectionB has done with some bats, they can just create a "Phyllostomidae" classification and use...
This is a different viewpoint than originally laid out, but I believe it leads to about the same place - collections can "prefer" bits and pieces of multiple classifications, managers can deal with the 50 rabbits they really care about without being force-fed a million insects which are in the same classification for some reason, and then collections can use those well-curated rabbit data without also needing to somehow munge aardvarks in with it.
That also means the rabbit-manager cannot possibly "oops" those million insects, which are in a different compartment, so this could open up the possibility of a hierarchical (or otherwise simplified) editor which writes directly to the core tables.
This makes documenting sources - https://github.com/ArctosDB/arctos/issues/3019 - even more important.
Yay everybody?
If I understand correctly, for DMNS:Inv, we would first choose Source WoRMS (via Arctos) then Source Arctos.
Next, I would copy any classifications that I've created without an aphiaid in WoRMs (via Arctos) to Arctos and delete them from WoRMS (via Arctos). I would still be the person listed as "managed by." There would be no classifications in WoRMS (via Arctos) without an aphiaid.
If - and it has happened to about 500 names since we started using WoRMs (via Arctos) - WoRMS adds a new name, my identification would automatically switch to WoRMS (via Arctos) and show the new classification. Once a year or so, you could probably give me a list of names that list me as the "managed by" that now have a WoRMs aphiaid, so I could remove my name.
Sounds perfectly awesome and I'm on board. Will need to do a lot of documentation updating, probably at the same time as all the changes we're consolidating per your request #2695.
Yes, YAY!
Yes, essentially.
Falling back to "Arctos" isn't necessary - you can do that, or create something new, or whatever, but not being limited to one classification source is the big picture.
I'm advocating getting rid of "managed by" as a term altogether now that there's less reason to have giant all-encompassing classifications but whatever, it's not hurting anything, if it makes you happy then rock on!
I might eventually get around to advocating for the WoRMS classification to be purely service-managed, but we can talk about that when/if we get there.
identification would automatically switch
Yup.
Help. I tried to change our source selection by making WoRMS (via Arctos) 1 and adding Arctos as 2.
When I save it, it reverses the order
Neato, thanks!
I applied duct tape, should be doing what you want but I'll think about that form some more.
Thanks. I'll test out a few records and see if anything else needs taping.
I might eventually get around to advocating for the WoRMS classification to be purely service-managed, but we can talk about that when/if we get there.
Once this is working - I vote we do as Dusty suggests
As a test, this morning I took Achatinella bryonii which isn't in WoRMS so there isn't been an aphiaid for it. I copied the entire classification that I had created in WoRMS (via Arctos) into Arctos and deleted the WoRMS (via Arctos) classification. It appears that the catalog record is able to find the correct classification but it doesn't show yet in the taxonomy page that the Source for DMNS:Inv for this particular name has changed to Arctos. Should that happen or will it take a while for it to change?
I'll update that. It's just a view of collection settings, nothing's broken....
@Nicole-Ridgwell-NMMNHS with this in place - I think we should set up a separate taxonomy source for geology stuff - I'll propose in a new issue once we have our data ready.
FWIW I sort of expect any diverse+active paleo collection is going to end up with about 20 taxonomy sources, assuming this is FINALLY the thing that gets people to managing taxonomy in Arctos.....
I don't see any problems with geology collections or mineral taxonomy or etc., but I suspect we're missing some tools - would be good to get that fleshed out ASAP, and of course real data always forges better tools.
I don't see any problems with geology collections or mineral taxonomy or etc., but I suspect we're missing some tools - would be good to get that fleshed out ASAP, and of course real data always forges better tools.
We have a working set of data and a plan that we are putting before a few geologists before we put it up for more community discussion. Should be a new issue soon...
Yay! I am excited about this. This will be great for our minerals and I'm looking forward to eventually building up a phylocode classification!
building up a phylocode classification!
If that means what I think it does, it's going to make us think about tools. A few examples of all the complexity that might be needed by any record would give me something to think about, should you happen to have some data hanging around....
@dustymc this is what we tentatively have for minerals, rocks and chemical elements. Have a blast. Geology Taxonomy.zip ...
Excellent, thanks!
Here is a download of data for Ornithischia, excluding genus/species from the Paleobiology Database, it is a mix of ranked and unranked terms: PBDB Ornithischia.zip I think having something like the hierarchical taxonomy editor that would work for unranked terms would be essential.
Well, this is well timed! I'll let you all mention as needed in today's discussion.
On Wed, Aug 19, 2020 at 9:33 AM Nicole-Ridgwell-NMMNHS < notifications@github.com> wrote:
- [EXTERNAL]*
Here is a download of data for Ornithischia, excluding genus/species from the Paleobiology Database, it is a mix of ranked and unranked terms: PBDB Ornithischia.zip https://github.com/ArctosDB/arctos/files/5097457/PBDB.Ornithischia.zip I think having something like the hierarchical taxonomy editor that would work for unranked terms would be essential.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/2231#issuecomment-676499011, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADQ7JBC54DF4HUJPPICS5OTSBPWDVANCNFSM4IOXE42Q .
That seems to be purely hierarchical - there's a term with zero or one parents (and some metadata, sometimes). Those data could be managed in some hierarchical tool, and as long as we don't have a need to flatten them then writing back to Arctos should be fairly straightforward. I think it could even take the shape of a built-in editor, as long as there's some mechanism to prevent adding inconsistent data (eg by disallowing access to the single-record editor - https://github.com/ArctosDB/arctos/issues/1698).
OK, I have found a flaw in the system (maybe).
Check out https://arctos.database.museum/name/Aphlebia
The insect usage of the name has been declared a synonym, so is not "valid" but the plant usage is valid. I have cloned in both classifications from GBIF (insect to the Arctos source and plant to Arctos Plants) and created the synonym relationship. Here's the rub. ALMNH:ES uses Arctos as the preferred source, with WoRMS (via Arctos) and Arctos Plants in succession. This means that they are going to wind up with the Arctos classification (insect) even if they really mean the plant and in this crazy scenario, they could potentially have both in their collection. Also, the plant version is not a synonym with Phyllodromica, but it is going to look that way now.
Sigh.
You found a flaw in taxonomy, not Arctos....
potentially have both
There's not much of a taxonomy solution for that. Split the collection, use taxon concepts to clarify, ....
plant version is not a synonym with Phyllodromica, but it is going to look that way now.
Relationships help search. If you want to do more, then we need relationships between classifications (which means we need a completely different approach to how we treat classification data, which is hard to imagine happening without dedicated funding).
I figured we could create an ALMNH:ES source for stupid one-offs like this but only if their collection includes only insects OR plants...
There is a part of me that wants to say - taxonomists can't get their act together and I shouldn't have to fix that....
Managing all animals in the "Arctos" classification is often problematic (Ex: http://arctos.database.museum/name/Cepolidae), and a bunch of plants-and-stuff that will surely find a way to clash sooner or later have been reintroduced by paleo collections.
Managing classifications in much smaller chunks avoids taxonomy-at-scale weirdness, but
1) Most collections have cataloged a few outliers and need taxonomy for them 2) I think everyone wants to pull expertise, which involves sharing a Source with the experts, which involves huge cumbersome groups of classifications
https://github.com/ArctosDB/arctos/issues/1852 would fix this: it doesn't matter which source a classification is in if you can select it individually, but I don't think we're realistically going to compile the data nor use a taxon concepts system.
From https://github.com/ArctosDB/arctos/issues/1852#issuecomment-484545346
Dynamic sources would address the idea that the scale at which taxonomy is best managed and the scale at which taxonomy is used are not necessarily the same.
Simplest case, a teaching collection might pull relevant animals from "Arctos" and relevant plants from "Arctos Plants."
DMNS:Inv could
Someone or some coalition could manage any group (species+subspecies, family, phylum, 'stuff we need that isn't in some other source' eg land snails, etc.) in the system of their choosing (including Arctos, the Arctos Hierarchical Editor, a desktop app, a remote system like WoRMS, etc.), then anyone else could pull those data or parts of them into their "preferred" classification.
There are no real barriers to this; it will fit in the current structure, we just need some (complicated and expensive, probably) SQL-or-something to build and maintain the merged classifications.
Nobody would be forced into this; the capability would not necessarily change anything for any existing collection, it would just add the possibility of combining existing data.
There are potentially consistency problems - maybe the Murids source classification will include superfamily and the Cricetids source classification will not, resulting in inconsistent rodents - but I suspect that would still be more overall consistent than the current data (in which individual names are often outliers).
It is worth comparing the scale of taxonomy in Arctos with the scale of taxonomy used by collections here; dynamic classifications could result in much smaller datasets, which might support more discovery methods.