ArctosDB / arctos

Arctos is a museum collections management system
https://arctos.database.museum
60 stars 13 forks source link

taxonomy relationships #735

Closed GoogleCodeExporter closed 5 years ago

GoogleCodeExporter commented 9 years ago
Searching "everything taxonomy" does not perform very well, in part because we 
do not have reciprocal taxonomy relationships and so must perform an additional 
expensive join to fully consider relationships.

To implement reciprocity, 
http://arctos.database.museum/info/ctDocumentation.cfm?table=CTTAXON_RELATION 
needs to have a "reciprocal_relationship NOT NULL" column (or some functional 
equivalent) added and populated.

One potential (probably minor?) complication may be in ICBN vs. ICZN usage - 
Wikipedia says "synonym" (botany) and "junior synonym" (zoology) are the same 
thing, for example.

Example:

http://arctos.database.museum/name/Nemata-->synonym 
of-->http://arctos.database.museum/name/Nematoda
http://arctos.database.museum/name/Nematoda--->(no relationships)

would (automatically, from the revised code table) update to

http://arctos.database.museum/name/Nemata-->synonym 
of-->http://arctos.database.museum/name/Nematoda
http://arctos.database.museum/name/Nematoda--->{whatever the reciprocal of 
"synonym of" is}--->http://arctos.database.museum/name/Nemata

This would also solve 
https://groups.google.com/d/msg/arctos-ac/DfZe3kxADlY/RN7H3q3JSzEJ (formatting 
taxonomy relationships).

Ref: https://code.google.com/p/arctos/issues/detail?id=734

Original issue reported on code.google.com by dust...@gmail.com on 13 Jul 2015 at 8:54

dustymc commented 8 years ago

Possible improvement: Move ALL relationships to classifications, something like we get from GlobalNames. Example:

http://arctos.database.museum/name/Arhopalus%20cervinus#ITIS

taxon name: Arhopalus cervinus species (root term in hierarchical terms, rank doesn't seem important): Arhopalus foveicollis interpretation: "ITIS says Arhopalus foveicollis is favored over Arhopalus cervinus" (or something like that...)

This is "correct" from a data standpoint; our current data (http://arctos.database.museum/name/Echidna%20russellii) ~assert "Echidna (all uses) is a bad spelling of Bitis (vipers)," which isn't correct; Echidna remains a "good" name for eels and a "bad" synonym for some other stuff (pointy mammals, moths).

We would lose the precision available under http://arctos.database.museum/info/ctDocumentation.cfm?table=CTTAXON_TERM, BUT we know many of those data are garbage anyway (see email "backwards synonyms" @DerekSikes 1 Apr 2016), and many of them intentionally avoid precision (eg., DLM uses "synonym of" to mean "sameish thing" with no ICxN intentions); I see little evidence that we're capable of usefully maintaining those data, and see no way of determining what's trustworthy.

It is currently very easy to delete classifications, it should probably be more difficult/require confirmation/something to delete a "synonym bearing" classification.

It's not exactly clear how we'll avoid synonym-bearing classifications in things like updating FLAT; all code dealing with "the collection's classification" would need reviewed.

All "any taxa" queries would need rewritten, but performance should improve (we'd need to tune only one thing, albeit one very large thing).

Question (possibly for GlobalNames): Under eg http://arctos.database.museum/name/Echidna%20russellii#CatalogueofLife (and many other examples) the query was for a "species" (binomial) and various sources return a monomial (genus). What exactly is the assertion?

sharpphyl commented 5 years ago

Just curious if we've considering assigning a number to each use of a taxon name the same way we do to a locality. Then could specific numbers be in each classification (and search etc.) Would that keep them straight and link the correct ones? Numbers would be unique. Names aren't and adding the author doesn't seem to be a huge improvement overall.

Question (possibly for GlobalNames): Under eg http://arctos.database.museum/name/Echidna%20russellii#CatalogueofLife (and many other examples) the query was for a "species" (binomial) and various sources return a monomial (genus). What exactly is the assertion?

I find this happens frequently. If the species isn't found in these sources, they are returned only to the genus level. Also, if the species is invalid, WoRMS returns the valid species. Not sure if this will happen in WoRMS (via Arctos) or not.

dustymc commented 5 years ago

assigning a number

We do - names have taxon_name_id and classifications have classification_id.

same way we do to a locality

...and just like localities, the ID isn't stable - they get replaced rather than updated when it's convenient, etc. Localities have 'locality_name' which IS stable - easy enough to add that to something like classifications, but (like locality_name) that would affect how the data may be managed.

campmlc commented 5 years ago

Taxon IDs sounds like a promising approach for dealing with the issue of linking a name to an authority, date, and classification.

On Mon, Dec 10, 2018 at 2:56 PM dustymc notifications@github.com wrote:

assigning a number

We do - names have taxon_name_id and classifications have classification_id.

same way we do to a locality

...and just like localities, the ID isn't stable - they get replaced rather than updated when it's convenient, etc. Localities have 'locality_name' which IS stable - easy enough to add that to something like classifications, but (like locality_name) that would affect how the data may be managed.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/735#issuecomment-445989310, or mute the thread https://github.com/notifications/unsubscribe-auth/AOH0hJ928oIJqebrpwiQ9S3w-cGswTtbks5u3tiDgaJpZM4ICJ2v .

dustymc commented 5 years ago

taxon_name_id uniquely identified NAMES. Names also uniquely identify names - we have a unique index.

Classification_id uniquely identifies classifications. We replace those every time we clone-edit-delete instead of editing or use the classification bulkloader.

campmlc commented 5 years ago

Is it possible to have a stable classification (e.g. classification+taxon name) "name" or ID?

On Mon, Dec 10, 2018 at 3:24 PM dustymc notifications@github.com wrote:

taxon_name_id uniquely identified NAMES. Names also uniquely identify names - we have a unique index.

Classification_id uniquely identifies classifications. We replace those every time we clone-edit-delete instead of editing or use the classification bulkloader.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/735#issuecomment-445997546, or mute the thread https://github.com/notifications/unsubscribe-auth/AOH0hC2UuY7FBBxUJEdIZDIFadTipLvqks5u3t8jgaJpZM4ICJ2v .

dustymc commented 5 years ago

Sure - we just don't allow them to change. "Don't allow certain data to change" seems like a critical component of managing taxon concepts anyway. I don't think that's any sort of deal-breaker, but it's absolutely a big change in how we view and manage classification data.

We currently treat taxon names as "data" - eg, you can't change them once they're used. Classifications are treated like "metadata" - you can delete them or replace them (to make family consistent, or because it's easier than editing, or because someone left some junk behind, or whatever). Moving to taxon concepts - even if the "concept" is just name+name-author+year - would elevate classifications to "data" - they'd become things you pick (presumably for reasons) rather than things you inherit (eg, from collection preferences). Allowing you to pick specific "concepts" and allowing those concepts to arbitrarily change would be pointless, so we'd have to lock some things down. Keeping an identifier stable in that context should not be a problem.

campmlc commented 5 years ago

That sounds like a very promising approach to solving some of our issues with choosing particular name+classification combos for a given collection or specimen, and dealing with homonyms?

On Mon, Dec 10, 2018 at 4:01 PM dustymc notifications@github.com wrote:

Sure - we just don't allow them to change. "Don't allow certain data to change" seems like a critical component of managing taxon concepts anyway. I don't think that's any sort of deal-breaker, but it's absolutely a big change in how we view and manage classification data.

We currently treat taxon names as "data" - eg, you can't change them once they're used. Classifications are treated like "metadata" - you can delete them or replace them (to make family consistent, or because it's easier than editing, or because someone left some junk behind, or whatever). Moving to taxon concepts - even if the "concept" is just name+name-author+year - would elevate classifications to "data" - they'd become things you pick (presumably for reasons) rather than things you inherit (eg, from collection preferences). Allowing you to pick specific "concepts" and allowing those concepts to arbitrarily change would be pointless, so we'd have to lock some things down. Keeping an identifier stable in that context should not be a problem.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/735#issuecomment-446007407, or mute the thread https://github.com/notifications/unsubscribe-auth/AOH0hFwY8T9aJllPUrzxpRosogpK06KPks5u3ufKgaJpZM4ICJ2v .

Jegelewicz commented 5 years ago

Agree. I have been wondering how the current model of "name as data and classification as metadata" came about. It seems like we are creating a lot of our own problems with the two layers of identification. What would we need to do to transition to such a model? and what am I missing about the current model that makes it more useful/appropriate?

dustymc commented 5 years ago

what am I missing

normalization

What would we need to do to transition to such a model?

In that model (as I see it), normalization is even more critical. The only significant structural change would be identification_taxonomy.taxon_name_id becoming identification_taxonomy.classification_id. (That sort of modularity is another benefit of normalization.)

That should just leave the usability issues to deal with.

Jegelewicz commented 5 years ago

Closing to consolidate issues see #1136