ArctosDB / arctos

Arctos is a museum collections management system
https://arctos.database.museum
61 stars 13 forks source link

New Taxon Classification Source in Arctos #3103

Closed campmlc closed 3 years ago

campmlc commented 4 years ago

No idea if this is the right place for this . . . Goal Create a new taxon classification source in Arctos for classifications provided by The Parasite Tracker TCN

Context Ectoparasite taxonomy in Arctos is ad hoc at best

Value The Parasite Tracker (TPT via Arctos) ? other ideas?

Priority High - grant funded

Jegelewicz commented 4 years ago

Will we be able to draw directly from TPT as we do with WoRMS?

Either way, I support this and I like the source name TPT (via Arctos) to match up with the WoRMS (via Arctos) source.

campmlc commented 4 years ago

No, unfortunately, at least not yet. They have also requested we give them our "host" taxonomy - and I am wondering how we would download Arctos vertebrate taxonomy and classification as a flat file . . .

On Wed, Sep 9, 2020 at 7:39 PM Teresa Mayfield-Meyer < notifications@github.com> wrote:

  • [EXTERNAL]*

Will we be able to draw directly from TPT as we do with WoRMS?

Either way, I support this and I like the source name TPT (via Arctos) to match up with the WoRMS (via Arctos) source.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/3103#issuecomment-689919596, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADQ7JBAYIDS3HFEO62UQBI3SFAU6TANCNFSM4RDLSQSA .

dustymc commented 4 years ago

as a flat file

If the data could be represented with a flat file, we'd have saved everyone a whole bunch of work and just used a flat file as a data model.

http://globalnames.org/ remains my preferred method of sharing taxonomy. They can get Arctos data there, and if they'd upload their data then you could just create the names and let Arctos take care of the rest. NSF funded something wonderful and powerful and functional, we should use it if we can.

campmlc commented 4 years ago

So to clarify, if I screen the names first in the validator, and then load the names to Arctos, these names will also be picked up at Global Names if they aren't there already? Same for our classifications? But if they load the names and classifications to Global Names first, then we can get them?

dustymc commented 4 years ago

Nothing at all happens until there's a name in Arctos. (GlobalNames takes whatever anyone feeds them, which makes them very useful but also unable to act as an authority.)

You can decide if the validator is useful, or useful enough to deal with, to you or not.

Arctos constantly pulls from globalnames; anything there should find its way to us.

Using eg "The Mammal Species of The World" https://arctos.database.museum/name/Odobenus%20rosmarus#TheMammalSpeciesofTheWorld (arbitrary thing from GN) as a preferred source is about 90% social problem - there's no insurmountable technical barrier to using non-local classification data in your identifications.

Arctos periodically shares data with GN, so yes things go that way too.

campmlc commented 4 years ago

To further clarify, if there is a name and classification in GN, and I add the name to Arctos, will Arctos pull the Global Names classifcation? I thought classifications had to be added manually by Arctos users, by cloning a classification into an existing name one at a time? That is not a realistic approach to load several thousand names and classifications. I am obviously still struggling to understand this process. I am working on loading names to Arctos. But after that, since the classification bulkloader is not working, I don't know what to do next.

dustymc commented 4 years ago

If there's a name in Arctos, then whatever's in GN will also end up in Arctos. (The system, not the source - naming things is hard!) Other classifications in Arctos are not involved in that in any way. Find any name, scroll down, you'll probably find some stuff from GN. That cannot currently be used for IDs, but that could be made my problem, not yours.

If you do need to manage these data locally, then rebuilding the loader will have to be discussed and prioritized.

Jegelewicz commented 4 years ago

My understanding.

The global names classifications appear in Arctos, but they do not DO anything in Arctos catalog records. So, although the classification is in Global Names, it will not appear on your catalog records unless you clone it into an Arctos source.

Because GN is NOT an authority, there is a ton of garbage in there and there will be multiple classifications for any given name.

Personally, watching this discussion, I still think the very best option would be for the TCN to approach WoRMS and request edit access for these species. If the data were managed in WoRMS, it would be available to everyone and be way more of an authority than Global Names or even Arctos. Then Mariel could select WoRMS (via Arctos) as her primary taxonomy source for the parasite collection and she wouldn't have to import any classifications. All the other collections could make use of WoRMS in the same way - although they might need to develop some new functionality.

The second-best option is to create a new source in Arctos that should only have the TCN taxonomy in it. This could then become the source for the whole TCN via the Arctos API no? The thing about this is that Mariel will need to import the classifications and then she (or someone at Arctos with manage taxonomy) will have to maintain them. Not a deal breaker, but seems crappy to make us the free management system.

dustymc commented 4 years ago

unless you clone it into an Arctos source

Or we allow remote sources to be in collections' preferred list, which is relatively trivial (albeit likely to introduce some social problems).

WoRMS

Taxonomy wasn't created at that scope, and it almost never painlessly scales to that scope. I'd expect a bajillion homonyms and such, all the problems that make managing things like the "Arctos" source way more work than it should be.

Throwing up a clone of WoRMS API would be cool though....

Jegelewicz commented 4 years ago

Taxonomy wasn't created at that scope, and it almost never painlessly scales to that scope. I'd expect a bajillion homonyms and such, all the problems that make managing things like the "Arctos" source way more work than it should be.

? I don't know what you mean. WoRMS classifications are managed by "experts" who have access to do so or whose data is ingested by WoRMS. http://www.marinespecies.org/about.php

campmlc commented 4 years ago

I think for the short term we need to do something that works in Arctos, and save the GN and WoRMS ideas for later. From what I can tell we are way ahead of anything else going on with the TPT or with TaxonWorks. They just have their csv files. I'm working on loading the names, and I would be willing to work on classifications except I have not done that before and the bulkloader isn't working? Could definitely use some help with creating any kind of a separate source. We could also just load all these to Arctos, except that since some already exists with mostly bad classifications, I assume that would be more difficult.

On Mon, Sep 14, 2020 at 11:47 AM dustymc notifications@github.com wrote:

  • [EXTERNAL]*

unless you clone it into an Arctos source

Or we allow remote sources to be in collections' preferred list, which is relatively trivial (albeit likely to introduce some social problems).

WoRMS

Taxonomy wasn't created at that scope, and it almost never painlessly scales to that scope. I'd expect a bajillion homonyms and such, all the problems that make managing things like the "Arctos" source way more work than it should be.

Throwing up a clone of WoRMS API would be cool though....

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/3103#issuecomment-692211389, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADQ7JBDPWZRJLNM7X6NXJSLSFZJMBANCNFSM4RDLSQSA .

Jegelewicz commented 4 years ago

Well then, let's get on with the discussion at #3110 and get the tool working. I think you should create a new source and put the TCN classifications in there. That's easy enough. All we need to do is decide on the name of the source.

campmlc commented 4 years ago

Im ok with TPT (via Arctos) or Parasite Tracker TPT (via Arctos)

On Mon, Sep 14, 2020 at 11:55 AM Teresa Mayfield-Meyer < notifications@github.com> wrote:

  • [EXTERNAL]*

Well then, let's get on with the discussion at #3110 https://github.com/ArctosDB/arctos/issues/3110 and get the tool working. I think you should create a new source and put the TCN classifications in there. That's easy enough. All we need to do is decide on the name of the source.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/3103#issuecomment-692215658, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADQ7JBDABZLKLELSJK4QB6TSFZKJPANCNFSM4RDLSQSA .

Jegelewicz commented 4 years ago

Created new source - TPT (via Arctos) - The Parasite Tracker is a Thematic Collections Network (TCN) for parasites. Classifications in this source are vetted by the TCN members and should not be modified except when updates are made to the TPT taxonomy.

Any changes needed for the description?

campmlc commented 4 years ago

Sounds good! Thanks!

On Mon, Sep 14, 2020 at 12:05 PM Teresa Mayfield-Meyer < notifications@github.com> wrote:

  • [EXTERNAL]*

Created new source - TPT (via Arctos) - The Parasite Tracker is a Thematic Collections Network (TCN) for parasites. Classifications in this source are vetted by the TCN members and should not be modified except when updates are made to the TPT taxonomy.

Any changes needed for the description?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/3103#issuecomment-692220739, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADQ7JBG5BZMXAT52RLFSZ4DSFZLMTANCNFSM4RDLSQSA .

dustymc commented 4 years ago

what you mean

Arctos (the source, not system) scale "lists" always run into things like homonyms, which are generally discovered when a bunch of spiders appear in a clam collection or similar. Smaller lists - mammals, fish, maybe parasites - don't encounter those problems very often, and usually can come up with a workable solution when they do.

managed by "experts"

Doesn't matter, so are a bunch of the other sources that have these problems when merged.

save the GN ... for later

To be clear, that's working now and has since GN became public. If you get your CSV to GN today, you could start seeing stuff in Arctos today. (And GN can and does deal with CSV.)

load all these to Arctos

You can, but just about 100% chance it would cause problems. If you must (or want to) manage locally, strong "yes" vote on the new Source from me.

mostly bad classifications,

And some of them are probably used for fish or something.

campmlc commented 4 years ago

Is there any documentation on how to submit names and classifications to GN?

On Mon, Sep 14, 2020, 12:25 PM dustymc notifications@github.com wrote:

  • [EXTERNAL]*

what you mean

Arctos (the source, not system) scale "lists" always run into things like homonyms, which are generally discovered when a bunch of spiders appear in a clam collection or similar. Smaller lists - mammals, fish, maybe parasites - don't encounter those problems very often, and usually can come up with a workable solution when they do.

managed by "experts"

Doesn't matter, so are a bunch of the other sources that have these problems when merged.

save the GN ... for later

To be clear, that's working now and has since GN became public. If you get your CSV to GN today, you could start seeing stuff in Arctos today. (And GN can and does deal with CSV.)

load all these to Arctos

You can, but just about 100% chance it would cause problems. If you must (or want to) manage locally, strong "yes" vote on the new Source from me.

mostly bad classifications,

And some of them are probably used for fish or something.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/3103#issuecomment-692231116, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADQ7JBDPTZPA5WF7LWEX7XTSFZNYDANCNFSM4RDLSQSA .

dustymc commented 4 years ago

documentation on how to submit names and classifications to GN?

@dimus can probably get you pointed in the right direction.

dimus commented 4 years ago

@campmlc the best way for the moment is to open an issue at https://github.com/GlobalNamesArchitecture/dwca_hunter/issues. It also helps if you mention @dimus, then the issue will propagate up on my list.

campmlc commented 4 years ago

Issue submitted to GN; plan is to load TPT taxonomy through them in November. Currently will continue to work on local Arctos source.

On Thu, Sep 17, 2020 at 10:14 AM Teresa Mayfield-Meyer < notifications@github.com> wrote:

  • [EXTERNAL]*

Assigned #3103 https://github.com/ArctosDB/arctos/issues/3103 to @campmlc https://github.com/campmlc.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/3103#event-3778812762, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADQ7JBFIF5HYKRZTRRETRR3SGIYXTANCNFSM4RDLSQSA .

dustymc commented 4 years ago

Here's a template you can munge any classification data you want to load into - let me know if something doesn't fit.

BulkloadClassificationTemplate(1).csv.zip

Jegelewicz commented 4 years ago

here it is in Excel BulkloadClassificationTemplate.xlsx

Jegelewicz commented 3 years ago

I think we are done here - closing