ArctosDB / arctos

Arctos is a museum collections management system
https://arctos.database.museum
60 stars 13 forks source link

Organism ID #1966

Closed Jegelewicz closed 2 years ago

Jegelewicz commented 5 years ago

Issue Documentation is http://handbook.arctosdb.org/how_to/How-to-Use-Issues-in-Arctos.html

Is your feature request related to a problem? Please describe.

We have been working with organisms for which we have multiple occurrences, specifically Mexican Wolves in the Mexican Wolf recovery program. Throughout their lives, samples of blood are taken from these animals and deposited in the genomic resources collection at MSB. Traditionally, each set of samples (all from the same day) have been given a single catalog number. This results in multiple cataloged items for a single organism, which we can link to each other using the “same individual as” relationship.

image

These relationships are nice, but they don't allow us to see ALL events for an individual in one place. and they require the addition of a new relationship for ALL related cataloged items every time a new collection of blood is made. Each cataloged item includes the other ID “Mexican Wolf Studbook Number” and we have modified the Other ID url so that clicking this other ID allows us to find all of the samples from any given animal.

image

This method works, but there is one issue we need to address.

When our data leaves Arctos and is ingested by aggregators such as GBIF and iDigBio, there is no easy way for anyone using the data there to make the connection that the various cataloged items are all from the same animal. Although the Mexican Wolf Studbook numbers are included in the list of related IDs, the connection just isn’t as tight as we would like it to be.

image image

Describe the solution you'd like

Our proposed solution is to make use of the Darwin Core field “Organism ID”. We envision this as a separate and distinct other ID – one which provides a link to all related specimens (the results of that link would look just like the search result you see when you search one of the Mexican Wolf Studbook numbers):

image

This identifier would be passed to aggregators in the “Organism ID” field – allowing those using the data there to make the appropriate connection between the related cataloged items. Currently it appears that we are just passing the catalog item to that field

image

which is what led to the solution we have been attempting to make work in https://github.com/ArctosDB/arctos/issues/1545. This has created problems with data entry and maintenance on our end. This new solution will allow us to keep events matched with parts and parts matched with accessions. It will simplify data entry and end the need for the links between events and parts.

We envision a new code table: CTCOLL_ORGANISM_ID set up very much like CTCOLL_OTHER_ID_TYPE where:

IDType = text “Mexican Wolf Studbook Number”

Description = definition of the IDType Studbook number assigned by the Mexican Wolf Recovery Program

BaseURI = http://arctos.database.museum/SpecimenResults.cfm?oidtype=Mexican%20wolf%20studbook%20number&oidnum=

When the Organism ID is used, there would be no need for all of the “same organism as” relationships, but they could be used if a collection so desired. Every cataloged item that included an Organism ID would instead appear like this:

image

With the text “Mexican Wolf Studbook Number: 1216” being a link taking you to the search results:

image

We would hope that this link could also be what appears at the aggregators in their “Organism ID” field:

image

Describe alternatives you've considered The major challenge we see with this method is how to assign unique Organism IDs for things where there isn’t an obvious one. The Mexican Wolves (and eventually the Red Wolves that are expected to come in from Arkansas) and NEON recaptures are examples of when we would be using this method. These all have obvious unique identifiers (studbook numbers and NEON sample ID numbers). However, when the skin and skeleton of an animal are at DMNS and the tissues for that same animal are at MSB, there is no obvious organism ID type and we would need to come up with one. We are open to suggestions for how best to accomplish this.

What have we missed?

Additional context See above

Priority I would like to have this resolved by date: soonish

Jegelewicz commented 5 years ago

I have passed this by John Wieczorek and here is our discussion:

The proposal to use dwc:organismID in Darwin Core resource is right on target. That is exactly what the field is meant for. You are right that Arctos is passing the id for the cataloged item in that field right now. The reasoning was based on the majority of cases, where the cataloged item corresponds to an Organism. Rigorously speaking, I think this is a mistake, because cataloged item does not always correspond to an Organism, and in Arctos, we don't have a fail-proof method of a knowing when it does, and when it doesn't. Given that, I think we should unmap organismID from the cataloged item in all Arctos resources.

I have looked at the proposal for the new code table (CTCOLL_ORGANISM_ID). I think this is unnecessary and unsustainable. I think a sufficient solution, which is also the most scalable, is to add a new type in CTCOLL_OTHER_ID_TYPE, called "organism identifier" or similar. Curators would have the freedom to create a (single) organism identifier, and that should be a persistent resolvable GUID. It could refer to any organism within Arctos, or outside it. Note that in the case of the Mexican Wolf Studbook Number, there would be two entries in the COLL_OTHER_ID table for each cataloged item - one with type "Mexican Wolf Studbook Number", which holds the number, and one with type "organism identifier" with the resolvable GUID to the organism.

There will be issues of "persistence" and of primacy (if two data publishers have distinct organismIDs, which should be used?), but those will exist outside of the scope of the immediate problem anyway. It's something that could conceivably be solved at a level above the publication of primary occurrence data.

Following what I am proposing above, there would be no need to communicate anything to GBIF, iDigBio, or GGBN. We would be following the intended use of dwc:organismID. The misunderstandings from iDigBio and GGBN are around the conflation of Occurrences by Arctos, not about the concept of Organism. The proposed solutions do not save us with respect to GBN either. With them the issue is that they want records of tissue samples, while everyone else in the world expects Occurrences, and these are not always the same thing, especially in Arctos. So, we still have to make distinct resources for GGBN, unfortunately.

My response:

I'm not sure I can wrap my brain around the other ID type solution. I feel like what you describe is what we do with the Mexican Wolves now - how would the GUIDs be created and where would they "live"?

I'm not the most technical person, so without a demo, it's just hard for me to see how two independent Other IDs will resolve to a GUID somewhere...but the idea seems the same as what I proposed just technically more stable? If so, I am on board and I agree that we need to stop sending catalog number as Organism ID AND that MSB needs to stop trying to catalog all collections for a single wolf in a single catalog number - which is why I proposed the solution I did - it is just too messy and information is lost in the process.

This is coming to the forefront for other reasons: https://github.com/tdwg/dwc-qa/issues/131

I'd like to create a simple solution to the organism issue - it really shouldn't be that difficult within Arctos. The problem of everyone agreeing on an ID when you consider stuff outside of Arctos is something we need to tackle as a larger community and is related to unique identifiers in general. Let me know how I can help push a solution forward and I'll do everything I can!

John responds:

I'm not sure I can wrap my brain around the other ID type solution. I feel like what you describe is what we do with the Mexican Wolves now - how would the GUIDs be created and where would they "live"?

In Arctos the GUIDs would live in the Coll_Obj_Other_ID_Num table with an OTHER_ID_TYPE of "organism identifier". Curators would be responsible for entering these (read "danger").

I'm not the most technical person, so without a demo, it's just hard for me to see how two independent Other IDs will resolve to a GUID somewhere...but the idea seems the same as what I proposed just technically more stable? If so, I am on board and I agree that we need to stop sending catalog number as Organism ID AND that MSB needs to stop trying to catalog all collections for a single wolf in a single catalog number - which is why I proposed the solution I did - it is just too messy and information is lost in the process.

Two independent Other IDs do not resolve to a GUID somewhere. One of the IDs says "I am this Mexican Wolf Sudbook Number", the other says, "my dwc:orgnismID is this". Hey, maybe that's what to put in the CTOTHER_ID_TYPE table - "dwc:organismID" - it would be quite explicit.

This is coming to the forefront for other reasons: https://github.com/tdwg/dwc-qa/issues/131

I'd like to create a simple solution to the organism issue - it really shouldn't be that difficult within Arctos. The problem of everyone agreeing on an ID when you consider stuff outside of Arctos is something we need to tackle as a larger community and is related to unique identifiers in general. Let me know how I can help push a solution forward and I'll do everything I can!

True. It is a community issue. Arctos is a great resource for pushing the limits of what we are able to do. For many outside it is way too far ahead, despite the fact that for some inside it doesn't do all we might want.

From me:

In Arctos the GUIDs would live in the Coll_Obj_Other_ID_Num table with an OTHER_ID_TYPE of "organism identifier". Curators would be responsible for entering these (read "danger").

The "danger"is what I was hoping to avoid with the separate table for organism ID - using "Mexican Wolf Studbook Number" as the base of the ID means we don't get "Mexican wolf studbook number 1216", "Mex Wolf Studbook No. 1216", etc.

Jegelewicz commented 5 years ago

Two independent Other IDs do not resolve to a GUID somewhere. One of the IDs says "I am this Mexican Wolf Sudbook Number", the other says, "my dwc:orgnismID is this". Hey, maybe that's what to put in the CTOTHER_ID_TYPE table - "dwc:organismID" - it would be quite explicit.

To be clear - I don't propose there be two IDs, but to MOVE those other IDs that are truly Organism IDs to the new table.

dustymc commented 5 years ago

In general, I think having some sort of "individual ID" would be very useful. It's not at all clear to me why it would be in a separate table; that invites more denormalization (doing the same thing multiple ways), inevitably leading to even bigger messes.

If the scope of this is Arctos, we could exploit relationships to assemble "individuals" and/or individualID without adding any overhead - there's much more discussion on that in https://github.com/ArctosDB/arctos/issues/1545 - and see below.

I believe that this is implicitly a proposal to recatalog http://arctos.database.museum/guid/MSB:Mamm:292063 as 5 specimens. At least for some use cases that goes against the "catalog the item of scientific interest" mantra; eventually two of the samples from the same wolf will be compared in a publication. I'm not sure that's more evil than the current situation, where 5 samples collected at different times under different conditions are likely seen as equivalent to 5 tubes from the same liver of another specimen, but it should be acknowledged. I think any consistent documented approach is an improvement.

"Occurrences" are occasionally recorded in different collections, both in and out of Arctos, so cataloging Occurrences rather than individuals would make Arctos data more comparable with the rest of the world. I'm not sure how much weight that should carry, but again it is a consideration that should be addressed.

All of that said, I don't think Arctos can or should dictate how material is cataloged. I think the most we can do is to provide documentation/guidance.

This should extend beyond Arctos. A sample of http://arctos.database.museum/guid/MSB:Mamm:292063 stored in another system and shared with GBIF would ideally bear the same "individual ID" as the record(s) in Arctos. If it did, it would be trivial to assemble the individual in GBIF or similar systems.

The "danger" is in assigning the identifiers, and I don't believe there is any technical solution to that - it's a social problem that needs a social solution. It took seconds to find https://arctos.database.museum/guid/MSB:Mamm:317312 and https://arctos.database.museum/guid/MSB:Mamm:324187 which share a NEON ID and probably are not the same organism. I have never encountered a "number series" that didn't have similar issues, and if that exists the NEON ID cannot do what you want. I think this would be best implemented as GUIDs, and for social reasons those should probably not be minted by Arctos. Drawing those from an independent source would let Curators determine what is or is not an Individual on a case-by-case basis independent of any problems with identifiers assigned by other organizations, and at least maintains some possibility that other collections holding material from the same individuals would buy in and assign those IDs to their specimens. Two candidates are UUIDs, which would not be resolvable or actionable, or ARKs which could be resolvable and could point to some shared view (eg, GBIF, which in turn could point to the various bits and pieces of the individual in various systems/collections).

I think that also could be implemented only as guidance; I don't think Arctos can or should prevent someone from using "1" as an IndividualID, but we can help them understand the implications of doing so.

Jegelewicz commented 5 years ago

How would this not be denormalization?

organismID = Mexican Wolf Studbook Number 1216 organismID = Mex Wolf Studbook No 1216 organismID = Mexican wolf studbook number 1216

These are all the same organism, but now we have three IDs for it. If we have:

ORGANISM_ID where:

IDType = text “Mexican Wolf Studbook Number”

Description = definition of the IDType Studbook number assigned by the Mexican Wolf Recovery Program

BaseURI = http://arctos.database.museum/SpecimenResults.cfm?oidtype=Mexican%20wolf%20studbook%20number&oidnum=

At least we eliminate the problem of the many ways "Mexican Wolf Studbook Number" might be spelled.

I think this would be best implemented as GUIDs, and for social reasons those should probably not be minted by Arctos.

I agree with this statement - but no one is stepping up to the plate for biological specimens (at least no one I am aware of). While the solution above does not fix the problems of the world, it would be a start for Arctos collections and maybe we could use that to press the issue with the community.

I looked up ARKs and I'm not clear on how that works - if is a solution, then let's explore, but I need an example because it seems very fuzzy to me and doesn't solve the social problem as far as I can tell.

Jegelewicz commented 5 years ago

I believe that this is implicitly a proposal to recatalog http://arctos.database.museum/guid/MSB:Mamm:292063 as 5 specimens. At least for some use cases that goes against the "catalog the item of scientific interest" mantra; eventually two of the samples from the same wolf will be compared in a publication.

Yep - and the cataloging of separate events with one catalog number results in events and parts that are not properly associated with their accessions, their collectors and preparators, nor their attributes. (The event links are OK, but easily broken or incorrectly made).

Jegelewicz commented 5 years ago

Should OrganismIDs be a DOI?

dustymc commented 5 years ago

I'm still not following. You want another table that's the same structure and does the same thing as OtherIDs??

And yes those data are denormalized - that's a lot easier to deal with that denormalized structure, and one of many reasons a GUID of some sort would be a useful value.

There is no technical solution to social problems. We can make it enticing to assign unifying IDs, but that's about it.

ARKs are functionally much like DOIs, but they're free (and don't come with the buy-in, which I suspect means they also don't come with the persistence).

https://n2t.net/ark:/87299/x6d50k1v

If I a couple million dollars and nothing better to do, everything in Arctos would have a DOI. DOIs would be great "individialIDs" but I don't think I can supply them. And that would lead back into the whole "controlled by Arctos" thing, which I don't think has any chance of being adopted by anyone outside of Arctos. I can provide tools, but the folks who own these specimens should also own the unifying identifiers.

Jegelewicz commented 5 years ago

I'm still not following. You want another table that's the same structure and does the same thing as OtherIDs??

EXCEPT - those IDs would be passed to GBIF and other aggregators as "Organism_ID".

I have also considered just using a check box in the Other_ID table "this is an organism ID"....

dustymc commented 5 years ago

Thanks - I might actually get it now!

It's Arctos-centric and not very pretty, but at least it's not denormalization: http://arctos.database.museum/SpecimenResults.cfm?oidtype=Mexican%20wolf%20studbook%20number&oidnum=none is a perfectly valid value for other_id_type=OrgID (whatever we call it).

That could be generated by a "this is an orgid" button. I could even abstract it to a saved search or ARK, but that gets us back to the "Arctos-centric" thing.

And again, if the scope of this is just "works for Arctos" then I think we'd be better off doing something with relationships. (@tucotuco pointed out that an ID works from a spreadsheet where a relationship may not, so "something" might be generating a URL that finds ID=value as above - IDK, that's details, I'm totally open to ideas).

campmlc commented 5 years ago

" That could be generated by a "this is an orgid" button" - you mean in the code table, correct? Also, we would not want to see the "messy" http://arctos.database.museum/SpecimenResults.cfm?oidtype=Mexican%20wolf%20studbook%20number&oidnum=none in the display. We'd want to see "Organism ID: Mexican Wolf Studbook Number: 1216". possible?

On Wed, Mar 13, 2019 at 3:11 PM dustymc notifications@github.com wrote:

Thanks - I might actually get it now!

It's Arctos-centric and not very pretty, but at least it's not denormalization: http://arctos.database.museum/SpecimenResults.cfm?oidtype=Mexican%20wolf%20studbook%20number&oidnum=none is a perfectly valid value for other_id_type=OrgID (whatever we call it).

That could be generated by a "this is an orgid" button. I could even abstract it to a saved search or ARK, but that gets us back to the "Arctos-centric" thing.

And again, if the scope of this is just "works for Arctos" then I think we'd be better off doing something with relationships. (@tucotuco https://github.com/tucotuco pointed out that an ID works from a spreadsheet where a relationship may not, so "something" might be generating a URL that finds ID=value as above - IDK, that's details, I'm totally open to ideas).

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1966#issuecomment-472607050, or mute the thread https://github.com/notifications/unsubscribe-auth/AOH0hCEHhRD5iBe6CGraaQvG4XAq94Duks5vWWl1gaJpZM4buGmY .

dustymc commented 5 years ago

No, in the interface.

http://arctos.database.museum/SpecimenResults.cfm?oidtype=Mexican%20wolf%20studbook%20number&oidnum=none is a GUID - and an actionable one at that. There's only one of them on the planet and it's easy to tell what it does. (It's not very pretty and may or may not be very persistent, but that's details.)

Mexican Wolf Studbook Number: 1216 is a string. Anyone can use it for any purpose anywhere; it doesn't natively do anything, and trying to do anything with it comes with a big pile of indefensible assumptions.

Edit for completeness: https://n2t.net/ark:/87299/x68g8hqw currently does the same thing as http://arctos.database.museum/SpecimenResults.cfm?oidtype=Mexican%20wolf%20studbook%20number&oidnum=none. It's prettier and likely more persistent. If I find another Occurrence of "none" I could re-point the ARK to somewhere mutually agreeable (eg, GBIF) in order to build a more complete picture of the Organism. It's a MUCH better solution than the URL, but also likely to take more investment than clicking a button.

2nd edit: I'm throwing ARKs around only because they're not-Arctos and super easy to create. They're not the only possible GUID, just a convenient and functional example.

tucotuco commented 5 years ago

...not to mention that the indefensible assumptions would be distinct for every different id type, ergo not scalable.

On Wed, Mar 13, 2019 at 6:49 PM dustymc notifications@github.com wrote:

No, in the interface.

http://arctos.database.museum/SpecimenResults.cfm?oidtype=Mexican%20wolf%20studbook%20number&oidnum=none is a GUID - and an actionable one at that. There's only one of them on the planet and it's easy to tell what it does. (It's not very pretty and may or may not be very persistent, but that's details.)

Mexican Wolf Studbook Number: 1216 is a string. Anyone can use it for any purpose anywhere; it doesn't natively do anything, and trying to do anything with it comes with a big pile of indefensible assumptions.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1966#issuecomment-472619696, or mute the thread https://github.com/notifications/unsubscribe-auth/AAcP68SCG5JP36cTtulJpcrF783lkU80ks5vWXJsgaJpZM4buGmY .

Jegelewicz commented 5 years ago

Mexican Wolf Studbook Number: 1216 is a string. Anyone can use it for any purpose anywhere; it doesn't natively do anything, and trying to do anything with it comes with a big pile of indefensible assumptions.

I don't get how what you propose is different from:

IDType = text “Mexican Wolf Studbook Number”

Description = definition of the IDType Studbook number assigned by the Mexican Wolf Recovery Program

base URL = http://arctos.database.museum/SpecimenResults.cfm?oidtype=Mexican%20wolf%20studbook%20number&oidnum=

tucotuco commented 5 years ago

I had been thinking there would be only one allowed organismID. Maybe that is silly. Maybe it is fine to have as many as you like. That way you could include your own AND those of other collections (in or out of Arctos). That way you could also potentially go directly to GBIF to get the set of Occurrences for all matching organismIDs.

Jegelewicz commented 5 years ago

HMMMM..I hadn't considered that.

Maybe it is fine to have as many as you like. That way you could include your own AND those of other collections (in or out of Arctos). That way you could also potentially go directly to GBIF to get the set of Occurrences for all matching organismIDs.

BUT when searching AT GBIF, how would they be related - so that some person who was unaware the two organism IDs were the same organism could make the connection?

campmlc commented 5 years ago

We were discussing earlier how we could link specimens at MSB and AMNH and Collecion Boliviana de Fauna that are all part of the same animal. All share the same field number, they are all the same organism, but how would we relate them in GBIF if AMNH assigns one and MSB assigns a different one? Ideally, we'd use the shared field number as the core ID, or we'd pay for a doi.

On Wed, Mar 13, 2019 at 5:21 PM John Wieczorek notifications@github.com wrote:

I had been thinking there would be only one allowed organismID. Maybe that is silly. Maybe it is fine to have as many as you like. That way you could include your own AND those of other collections (in or out of Arctos). That way you could also potentially go directly to GBIF to get the set of Occurrences for all matching organismIDs.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1966#issuecomment-472644191, or mute the thread https://github.com/notifications/unsubscribe-auth/AOH0hEL5hhP7FTXoNDyscDnEluqXKJ11ks5vWYf3gaJpZM4buGmY .

tucotuco commented 5 years ago

I don't get how what you propose is different from:

IDType = text “Mexican Wolf Studbook Number”

Description = definition of the IDType Studbook number assigned by the Mexican Wolf Recovery Program

base URL = http://arctos.database.museum/SpecimenResults.cfm?oidtype=Mexican%20wolf%20studbook%20number&oidnum=

It is very different outside the world of Arctos. The organismID would have to be constructed from this, and what would you do to create the organismIDs of the ten collections that have parts of the same plant? Create ten new ID types and base URLS (just to cover that one organism - multiply by all the collections that share any parts of any Organisms in Arctos)?

dustymc commented 5 years ago

different

It eliminates data stored in arbitrary places.

only one

Yea, I suspect reality will find a way to stomp all over that, but it would be nice....

link specimens

Arctos can link to anything with a URL, and provides a mechanism for incoming links.

shared field number

Everybody starts at "1." If you want links, you need actionable GUIDs. If you want discoverable, you need shared actionable GUIDs. You might get at "shared" by tracking down the other 40 samples in GBIF and adding their IDs to Arctos, although "here's a nice neutral persistent actionable identifier, would you mind using it so we can talk to each other?" would greatly simplify things.

tucotuco commented 5 years ago

All share the same field number, they are all the same organism, but how would we relate them in GBIF if AMNH assigns one and MSB assigns a different one?

I think that is what I am getting at in https://github.com/tdwg/dwc-qa/issues/131#issuecomment-472642620

tucotuco commented 5 years ago

Something akin to IGSNs, but for Organisms instead of for samples.

Jegelewicz commented 5 years ago

The organismID would have to be constructed from this, and what would you do to create the organismIDs of the ten collections that have parts of the same plant? Create ten new ID types and base URLS (just to cover that one organism - multiply by all the collections that share any parts of any Organisms in Arctos)?

I don't understand - you would only need one ID type. From any record in Arctos, I can click the link from the Mexican Wolf Studbook Number (no matter what number it is) and I'll get the specimen results page that show all of the wolves that share the same number.

If UTEP or UMNH or any other Arctos collection had a wolf specimen and put the studbook number in the "Mexican Wolf Studbook Number" other ID, then it would show up in the search too, because the link is an actionable guid like Dusty described.

It would be a social issue to decide upon an "ID Type" for the situation that you describe, but we should only need one. The challenge - as I pointed out in the very beginning is assigning the individual organism ID numbers, so that all collections with parts of the same plant would use "Individual Plant ID" = 1, etc.

I guess I am missing something (which doesn't surprise me...) The wolves are easy because they are all here and they have a (somewhat) logical identifier. Everything else will be messy until we have a unique BOI (Biological Organism Identifier).

campmlc commented 5 years ago

In all of these situations, there is a shared organism number already that links specimens. Examples currently in use within Arctos and between Arctos and outside collections (AMNH, USNM) are Mexican Wolf Studbook Number, NK number, AF number, Robert L. Rausch collector number, NEON individual ID. These are used to find and create relationships. The problem with relationships is that relationships are pairwise - we need a way to reciprocally link a network, and organism ID would allow us to do that - like the url link to the above IDs allows us to do that now within Arctos.

Can we mint DOIs or IGSNs?

On Wed, Mar 13, 2019 at 5:28 PM John Wieczorek notifications@github.com wrote:

I don't get how what you propose is different from:

IDType = text “Mexican Wolf Studbook Number”

Description = definition of the IDType Studbook number assigned by the Mexican Wolf Recovery Program

base URL = http://arctos.database.museum/SpecimenResults.cfm?oidtype=Mexican%20wolf%20studbook%20number&oidnum=

It is very different outside the world of Arctos. The organismID would have to be constructed from this, and what would you do to create the organismIDs of the ten collections that have parts of the same plant? Create ten new ID types and base URLS (just to cover that one organism - multiply by all the collections that share any parts of any Organisms in Arctos)?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1966#issuecomment-472645757, or mute the thread https://github.com/notifications/unsubscribe-auth/AOH0hPkdJpf-GdEBgmOtXbRz8iLU1X5Bks5vWYmkgaJpZM4buGmY .

dustymc commented 5 years ago

organisms, mint compliant ID

Don't half-bake this! - I want those for events, localities, agents, .... too.

Seriously, Arctos is built to plug in to something like that. If we have a local identifier for something it's only because nobody else would do it for us.

relationships are pairwise

Not really - there's always an implied second THING out there, but we don't have to be able to find it. "{whatever relationship of} ABC:XYZ:1234" is fine even if ABC:XYZ isn't online, "{whatever relationship of} NK 1" is fine even if 40 specimens (that we can find) wear "NK 1", etc.

reciprocally

I don't think a lack of reciprocity will ever be Arctos' fault.

I know many of your examples are not capable of acting as unique identifiers, and I suspect that's true of all of them.

Can we mint DOIs

Yes, in limited quantities - there are "get a DOI" links scattered all over the place.

IGSNs

Beats me - if they have a service and are willing to provide access we should be able to.

We could also mint ARKs in unlimited quantities if there's a reason to do so.

Jegelewicz commented 5 years ago

relationships are pairwise

Not really - there's always an implied second THING out there, but we don't have to be able to find it. "{whatever relationship of} ABC:XYZ:1234" is fine even if ABC:XYZ isn't online, "{whatever relationship of} NK 1" is fine even if 40 specimens (that we can find) wear "NK 1", etc.

But we WANT to find it! 40 fish with "same lot as" requires 39 relationships on all 40 records and then I have no easy way to see them all in one place (or I just don't know how to do it). In the same way - 20 events of blood samples from Mexican wolf studbook number 1216 requires 19 relationships on 20 records (and a relationship needs to be added to ALL of them every time a new set of samples comes in! It is a lot of work....

campmlc commented 5 years ago

We have litters of pups that are siblings of each other, offspring of two parents, and parents of other litters. Each of these individual organisms in turn may be handled multiple times over their lifetime resulting in multiple catalog numbers of different accessions of parts, potentially at different institutions. We need organism IDs to deal with the latter, and relationships that can deal with the former.

On Wed, Mar 13, 2019 at 5:52 PM dustymc notifications@github.com wrote:

organisms, mint compliant ID

Don't half-bake this! - I want those for events, localities, agents, .... too.

Seriously, Arctos is built to plug in to something like that. If we have a local identifier for something it's only because nobody else would do it for us.

relationships are pairwise

Not really - there's always an implied second THING out there, but we don't have to be able to find it. "{whatever relationship of} ABC:XYZ:1234" is fine even if ABC:XYZ isn't online, "{whatever relationship of} NK 1" is fine even if 40 specimens (that we can find) wear "NK 1", etc.

reciprocally

I don't think a lack of reciprocity will ever be Arctos' fault.

I know many of your examples are not capable of acting as unique identifiers, and I suspect that's true of all of them.

Can we mint DOIs

Yes, in limited quantities - there are "get a DOI" links scattered all over the place.

IGSNs

Beats me - if they have a service and are willing to provide access we should be able to.

We could also mint ARKs in unlimited quantities if there's a reason to do so.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1966#issuecomment-472650778, or mute the thread https://github.com/notifications/unsubscribe-auth/AOH0hM3K93lKilwho96shrOC8Z0_2qe0ks5vWY89gaJpZM4buGmY .

dustymc commented 5 years ago

easy way to see them

That's an interface problem.

a relationship needs to be added to ALL of them every time a new set of samples comes in!

That MAY be an interface problem too - eg, MAYBE I could just magic in reciprocals instead of the email. Not much problem technically, but there are social implications.

40 fish

That does occasionally happen, but more normal is a coyote, a beaver, 3 mice (all because the printer stuck), and all of their parasites (for reasons that don't make much sense to me).

siblings

There's an Issue somewhere about making inferences from relationships - also just a display problem.

organism IDs to deal with the latter, and relationships that can deal with the former.

Yea, there's some overlap that I don't think we can avoid. I think we need both anywhere we can - orgID is useless unless all of the bits are accessible, and relationships can't be used to find all the bits in places like GBIF. I'm not real happy with that, but I think it's reality.

Jegelewicz commented 5 years ago

See https://github.com/tdwg/dwc-qa/issues/131#issuecomment-472855544

dustymc commented 5 years ago

@campmlc is MSB onboard with this in general? I thought https://github.com/ArctosDB/arctos/issues/1545#issuecomment-398469421 was a flat "no" to cataloging Occurrences.

Do you intend to do with the NEON material whatever you do with the wolves?

I just spent a ridiculous amount of time wandering around GBIF looking for a real-world example. https://www.gbif.org/occurrence/1805830446 could be http://arctos.database.museum/guid/MSB:Mamm:306392 or http://arctos.database.museum/guid/MSB:Mamm:306394, but the data are pretty light in all records so who knows. Bits and pieces of http://arctos.database.museum/guid/MSB:Mamm:306393 work too but the sex doesn't, and who knows how reliable that is in either record.

Surely the zoo issues identifiers? There's just nothing in any record that definitively links all of this stuff together, or rejects any linkage. If I were a researcher I suppose I'd find myself writing to the collections and hoping they have some more information that they're willing to dig out. We can't force other collections to play nice, but maybe we can provide a shining example.

The sex for the MSB records suggests there's more information (http://arctos.database.museum/info/ctDocumentation.cfm?table=CTSEX_CDE&field=not%20recorded - "There is data in the form of a label or field notes, and there is no mention of sex.") Is that accurate - eg, are we entering data accurately, or should that be "unknown" or something?

The NK and the "NK" in the event remarks don't line up on http://arctos.database.museum/guid/MSB:Mamm:306392, which makes me think something important is missing.

I guess my one take-away is that any "organism registry" should aggressively collect anything that might be considered an identifier associated with the individual. It's painful to dig that out of GBIF, it's impossible to know what might have been withheld from GBIF, and it's remotely possible that the zoo or whoever owns the studbook would contribute to (and use) a registry.

Mostly unrelated, USNM is using ARK - http://n2t.net/ark:/65665/359989e34-8719-4907-823e-3f55dc8181e6

Jegelewicz commented 5 years ago

If I were a researcher I suppose I'd find myself writing to the collections and hoping they have some more information that they're willing to dig out. We can't force other collections to play nice, but maybe we can provide a shining example.

Assuming researchers even notice any connection. YES to the shining example!!!!

Even if MSB continues to choose to catalog organisms instead of occurrences, we need this solution for other collections and cross collection occurrences, although I am pretty sure the difficulty of cataloging multiple events per organism in a single record has everyone convinced it isn't sustainable over the long term....

The solution will not prevent anyone from cataloging organisms over occurrences if they so desire - but I would never recommend it.

campmlc commented 5 years ago

I am mostly on board after a long grieving period for our specimen event model because of the now obvious difficulties in implementing that model correctly. NEON seems to be the best of all possible worlds in implementing the specimen event model, and it seemed to work OK because everything was cataloged from scratch all according to a single consistent model and workflow. The NEON ID is equivalent to an organism ID that links all samples from all occurrences. See https://arctos.database.museum/guid/MSB:Mamm:299204. And this was a static collection on our end - we will not be receiving any more samples, samples are still being collected from the same organisms and being deposited at NEON's ASU repository. Again, the need for a cross-institutional, cross-platform organism ID exist regardless of whether or not we personally catalog organisms or occurrences.

However, Mexican wolves are a very different story, because they were all originally cataloged as occurrences, We have spent several years and an entire Master's thesis project on trying to consolidate these occurrences into single records of a single organism with multiple events. This process is far from complete. And now we are finding so many errors in the data entry and conversion process that it appears that effort may have largely been wasted. In addition, the specimen event model does not permit the tracking of different accessions, or of maintaining the linkage between a previous, submerged catalog number and the associated parts. This is a mess that is going to have to be straightened out, and it may be easier to back out than go forward.

The zoo specimens do have a global animal number in their database. We will be using that. But I don't know how global it truly is. Their database does not provide an associated url. Here is an example: GAN: 22019550

If USNM is minting Arks, then perhaps we can experiment with linking with them. They have hosts to our parasites and vice versa. We have plenty of examples within Arctos of different collections housing parts of the same organism. And NEON has parts of the same organism external to Arctos.

I agree with Teresa that we should not force anyone to use one model or the other, but we should support both. Regardless, there should be a formal organism ID for those situations that require one.

If much of this can be resolved via the interface, is it possible to have a toggle between an organism display and an occurrence display? If there is a designated organism ID, I'd like to see that above the catalog number, prominently displayed, with a click to Show in Organism View vs Show As Separate Occurrences or something. Then we could catalog occurrences separately, track accessions and separate catalog numbers, but be able to view the record optionally as we do now with the combined specimen events/parts linkages etc.?

On Thu, Mar 14, 2019 at 9:54 AM dustymc notifications@github.com wrote:

@campmlc https://github.com/campmlc is MSB onboard with this in general? I thought #1545 (comment) https://github.com/ArctosDB/arctos/issues/1545#issuecomment-398469421 was a flat "no" to cataloging Occurrences.

Do you intend to do with the NEON material whatever you do with the wolves?

I just spent a ridiculous amount of time wandering around GBIF looking for a real-world example. https://www.gbif.org/occurrence/1805830446 could be http://arctos.database.museum/guid/MSB:Mamm:306392 or http://arctos.database.museum/guid/MSB:Mamm:306394, but the data are pretty light in all records so who knows. Bits and pieces of http://arctos.database.museum/guid/MSB:Mamm:306393 work too but the sex doesn't, and who knows how reliable that is in either record.

Surely the zoo issues identifiers? There's just nothing in any record that definitively links all of this stuff together, or rejects any linkage. If I were a researcher I suppose I'd find myself writing to the collections and hoping they have some more information that they're willing to dig out. We can't force other collections to play nice, but maybe we can provide a shining example.

The sex for the MSB records suggests there's more information ( http://arctos.database.museum/info/ctDocumentation.cfm?table=CTSEX_CDE&field=not%20recorded

  • "There is data in the form of a label or field notes, and there is no mention of sex.") Is that accurate - eg, are we entering data accurately, or should that be "unknown" or something?

The NK and the "NK" in the event remarks don't line up on http://arctos.database.museum/guid/MSB:Mamm:306392, which makes me think something important is missing.

I guess my one take-away is that any "organism registry" should aggressively collect anything that might be considered an identifier associated with the individual. It's painful to dig that out of GBIF, it's impossible to know what might have been withheld from GBIF, and it's remotely possible that the zoo or whoever owns the studbook would contribute to (and use) a registry.

Mostly unrelated, USNM is using ARK - http://n2t.net/ark:/65665/359989e34-8719-4907-823e-3f55dc8181e6

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1966#issuecomment-472931695, or mute the thread https://github.com/notifications/unsubscribe-auth/AOH0hKm8M4_AMhTCptVm9Dkiwzue5D04ks5vWnCagaJpZM4buGmY .

dustymc commented 5 years ago

will not prevent anyone from cataloging organisms over occurrences

Correct/necessary.

recommend

Wherever this ends up, it should include documentation (and/or a publication).

I think my biggest reservation with cataloging "Occurrences" is citations. Download a bunch of data from GBIF, get some samples (all wearing different primary IDs), eventually publish garbage science because half of your material was from one poor wolf.

https://github.com/ArctosDB/arctos/issues/1130 / http://handbook.arctosdb.org/how_to/cite-specimens.html is still hanging around mostly unresolved. If we use GUIDs for "individual IDs" we could also put them in the "cite this as" column (and force-feed them to GBIF and etc.), which would cause all of those samples from the poor drained wolf to share a 'primary-ish' ID, which I think most researchers (at least those who cite anything) would notice. I'm still not convinced of anything enough to really advocate one way or the other, but this does seem like it gets at the heart of a major problem.

campmlc commented 5 years ago

This is absolutely a problem: think my biggest reservation with cataloging "Occurrences" is citations. Download a bunch of data from GBIF, get some samples (all wearing different primary IDs), eventually publish garbage science because half of your material was from one poor wolf."

but it is a problem with all these multiple event specimens outside of Arctos as well. NEON especially needs to come to grips with this because they are and will be generating so much of these kinds of data. And they have not even begun to consider this yet. We have received loan requests approved by NEON where the researcher did not realize that the 100 samples being requested included multiple samples from the same individuals over time. This would absolutely have affected results had we not pointed that out and insisted on loaning only a single sample per animal. But this depends on our being able to track this ourselves as well as display it to the outside world. And yes, citation is an issue, but it is an issue already. There seems to be no consistent or enforced policy - and while many people are working on this it will take community discussion as well as by-in and enforcement by journals and data publishers.

On Thu, Mar 14, 2019 at 10:46 AM dustymc notifications@github.com wrote:

will not prevent anyone from cataloging organisms over occurrences

Correct/necessary.

recommend

Wherever this ends up, it should include documentation (and/or a publication).

I think my biggest reservation with cataloging "Occurrences" is citations. Download a bunch of data from GBIF, get some samples (all wearing different primary IDs), eventually publish garbage science because half of your material was from one poor wolf.

1130 https://github.com/ArctosDB/arctos/issues/1130 /

http://handbook.arctosdb.org/how_to/cite-specimens.html is still hanging around mostly unresolved. If we use GUIDs for "individual IDs" we could also put them in the "cite this as" column (and force-feed them to GBIF and etc.), which would cause all of those samples from the poor drained wolf to share a 'primary-ish' ID, which I think most researchers (at least those who cite anything) would notice. I'm still not convinced of anything enough to really advocate one way or the other, but this does seem like it gets at the heart of a major problem.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1966#issuecomment-472957363, or mute the thread https://github.com/notifications/unsubscribe-auth/AOH0hCJRclGG-vVWOiL1A9q7KJwb2mp2ks5vWnzTgaJpZM4buGmY .

dustymc commented 5 years ago

specimen event model does not permit the tracking of different accessions,

It's not something that could happen next week, but "The Tuco Model" (everything's an Event) could handle that. The specimen-event<-->part link could be expanded as well - it's the same-ish idea at The Tuco Model, but more duct-tapey.

how global it truly is..GAN: 22019550

That's easy: it isn't.

If USNM is minting Arks, then perhaps we can experiment with linking with them.

Just to be clear, their ARKs are NOT "individual IDs" - they're just proxies to their data. Another ARK could be used to point to some merge-view of their+our data.

should support both

I don't think we have any possibility of avoiding that, at least not one that doesn't break cultural collections and such. And if we ever get a year's worth of GPS tag data or such maybe we'll want to use one catalog number for it, just to avoid minting a million new "records."

interface

Interface is "easy" if we get the data right. I can't quite envision how that might work at the moment, but I think the answer is ultimately "yes."

enforcement by journals and data publishers

I think we can play a part in that as well - eg, did this person asking to borrow specimens do what we ask them to do in the past? There's an "agent rank" option to share those sorts of data internally across Arctos collections.

campmlc commented 5 years ago

Can you explain the tuco model?

Yes, all of the current ID's we use to create relationships and link specimens are not global. Mexican wolf studbook numbers, NK numbers, AF numbers, collector numbers, NEON numbers, ear tag numbers etc etc. Yet that is what we have. Short of barcoding every organism and all it's parts at the moment of collection (which would be nice), I'm not sure I see a way around that in the current universe. The trick is keeping those identifiers, acknowledging there will be mistakes caused by duplicates or mistranscription, but making the data discoverable so that mistakes can be identified and resolved. So we can use what Teresa originally proposed as an organism ID, with all it's faults, realizing that that is what human beings will use and recognize as an ID, and then come up with some other truly unique id that can some how be assigned to these specimens that are linked by their relationships across occurrences and collections.

On Thu, Mar 14, 2019 at 10:57 AM dustymc notifications@github.com wrote:

specimen event model does not permit the tracking of different accessions,

It's not something that could happen next week, but "The Tuco Model" (everything's an Event) could handle that. The specimen-event<-->part link could be expanded as well - it's the same-ish idea at The Tuco Model, but more duct-tapey.

how global it truly is..GAN: 22019550

That's easy: it isn't.

If USNM is minting Arks, then perhaps we can experiment with linking with them.

Just to be clear, their ARKs are NOT "individual IDs" - they're just proxies to their data. Another ARK could be used to point to some merge-view of their+our data.

should support both

I don't think we have any possibility of avoiding that, at least not one that doesn't break cultural collections and such. And if we ever get a year's worth of GPS tag data or such maybe we'll want to use one catalog number for it, just to avoid minting a million new "records."

interface

Interface is "easy" if we get the data right. I can't quite envision how that might work at the moment, but I think the answer is ultimately "yes."

enforcement by journals and data publishers

I think we can play a part in that as well - eg, did this person asking to borrow specimens do what we ask them to do in the past? There's an "agent rank" option to share those sorts of data internally across Arctos collections.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1966#issuecomment-472962538, or mute the thread https://github.com/notifications/unsubscribe-auth/AOH0hGBn17F2HPyLurcYhhDqDaJwm1iLks5vWn91gaJpZM4buGmY .

dustymc commented 5 years ago

Is far enough along to throw it at some data and see what sticks?

If so, what should we call it? I still dislike dwc:organismID - I thing we should avoid encouraging this for wolfpacks and such. (We probably can't prevent that.) https://terms.tdwg.org/wiki/dwc:individualID seems closer, but I think maybe it's been deprecated?? https://github.com/tdwg/dwc 404s everything now so ?? @tucotuco

explain the tuco model?

JRW can better, but my understanding is that basically everything's an event. Catch a specimen? Event. Identify something? Event. Assign an ID to something? Event. That would be hugely powerful, but maybe not so friendly to write code to. (Or maybe it is, who knows, as far as I know nobody's ever tried anything even vaguely similar.) The idea has been bouncing around since the first Arctos-in-ABQ Meeting in 20-something.

barcoding every organism and all it's parts at the moment of collection (which would be nice)

There's another good use for ARKs - they'd ensure barcodes are globally-unique and allow them to lead to specimens, making it basically impossible to cite the wrong thing.

some other truly unique id that can some how be assigned to these specimens that are linked by their relationships across occurrences and collections

That's where ARK (or something like them) come in. Give your two specimens (which share an NK or something, which might also be applied to 40 parasites and a duck for some reason) a GUID and point it to something that contains both specimens and which points to the bits-n-pieces of the composite "individual" - http://arctos.database.museum/SpecimenResults.cfm?oidtype=Mexican%20wolf%20studbook%20number&oidnum=none works, as could GBIF. If "individualbank" comes to be you'd just redirect your ARK to it. If you find another sample in GBIF you could redirect the ARK to that. Along with the usual GUIDdy stuff, ARK (and DOI and etc.) provides a stable identifier, not a stable resolution - you can change the action without changing the value.

campmlc commented 5 years ago

How about the MSB NEON records? They are as consistent and complete as we can make them given the quality of data we received, which wasn't great. But it will be a good test case. We converted everything from individual occurrences to single organisms with multiple events. We had to ask NEON to create the NEON ID for an organism ID, because they had only sample IDs. If we can mint official dwc organism ID for these records, and add ARK Ids etc, we should be able to eventually integrate with the ASU database, which is under development. We would be ahead of that game.

On Thu, Mar 14, 2019 at 11:28 AM dustymc notifications@github.com wrote:

Is far enough along to throw it at some data and see what sticks?

If so, what should we call it? I still dislike dwc:organismID - I thing we should avoid encouraging this for wolfpacks and such. (We probably can't prevent that.) https://terms.tdwg.org/wiki/dwc:individualID seems closer, but I think maybe it's been deprecated?? https://github.com/tdwg/dwc 404s everything now so ?? @tucotuco https://github.com/tucotuco

explain the tuco model?

JRW can better, but my understanding is that basically everything's an event. Catch a specimen? Event. Identify something? Event. Assign an ID to something? Event. That would be hugely powerful, but maybe not so friendly to write code to. (Or maybe it is, who knows, as far as I know nobody's ever tried anything even vaguely similar.) The idea has been bouncing around since the first Arctos-in-ABQ Meeting in 20-something.

barcoding every organism and all it's parts at the moment of collection (which would be nice)

There's another good use for ARKs - they'd ensure barcodes are globally-unique and allow them to lead to specimens, making it basically impossible to cite the wrong thing.

some other truly unique id that can some how be assigned to these specimens that are linked by their relationships across occurrences and collections

That's where ARK (or something like them) come in. Give your two specimens (which share an NK or something, which might also be applied to 40 parasites and a duck for some reason) a GUID and point it to something that contains both specimens and which points to the bits-n-pieces of the composite "individual" - http://arctos.database.museum/SpecimenResults.cfm?oidtype=Mexican%20wolf%20studbook%20number&oidnum=none works, as could GBIF. If "individualbank" comes to be you'd just redirect your ARK to it. If you find another sample in GBIF you could redirect the ARK to that. Along with the usual GUIDdy stuff, ARK (and DOI and etc.) provides a stable identifier, not a stable resolution - you can change the action without changing the value.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1966#issuecomment-472976833, or mute the thread https://github.com/notifications/unsubscribe-auth/AOH0hKCFQJxPxVCXcqFdphbD09wlwspFks5vWoadgaJpZM4buGmY .

Jegelewicz commented 5 years ago

I think my biggest reservation with cataloging "Occurrences" is citations. Download a bunch of data from GBIF, get some samples (all wearing different primary IDs), eventually publish garbage science because half of your material was from one poor wolf.

Thus the need for organism ID

more duct-tapey.

No, please no - the duct tape is killing me with the wolves.

There's an "agent rank" option to share those sorts of data internally across Arctos collections.

Wait, what? Needs Documentation (and I gotta go find out what my rank is... :-)

Is far enough along to throw it at some data and see what sticks?

If so, what should we call it? I still dislike dwc:organismID - I thing we should avoid encouraging this for wolfpacks and such. (We probably can't prevent that.) https://terms.tdwg.org/wiki/dwc:individualID seems closer, but I think maybe it's been deprecated?? https://github.com/tdwg/dwc 404s everything now so ??

I think we can test the waters - we aren't going to break anything that isn't already broken as far as I'm concerned. Go here for working definition: https://dwc.tdwg.org/terms/#organismID (I think) although the example is not exactly what I'd hope for (It's an Arctos cataloged item url)

As for the event model. With the wolves, that is what I am arguing we do...essentially. Every time a particular wolf is encountered, we record an event, which in the case of Arctos is we create a catalog item. What we are missing from John's event model is the way to connect the events that are all related to the same organism. The relationship "same individual as" helps, but doesn't provide a way to connect more than two events (at least visually or for anyone outside of Arctos). I would argue that we are doing John's model, just not all the way.

And on a final note, I think I finally processed what John was talking about my solution being Arctos-centric. I'm open to the ARK idea, but I need to go read about it in more detail.

Jegelewicz commented 5 years ago

Is this where Arks come from? https://arkids.net/items

I'm not sure that I feel comfortable with the persistence of this.

What about these guys? https://identifiers.org/

I found it via this: https://www.ebi.ac.uk/miriam/main/datatypes/MIR:00000558

The Research Resource Identification Initiative provides RRIDs to 4 main classes of resources: Antibodies, Cell Lines, Model Organisms, and Databases / Software tools. The initiative works with participating journals to intercept manuscripts in the publication process that use these resources, and allows publication authors to incorporate RRIDs within the methods sections. It also provides resolver services that access curated data from 10 data sources: the antibody registry (a curated catalog of antibodies), the SciCrunch registry (a curated catalog of software tools and databases), and model organism nomenclature authority databases (MGI, FlyBase, WormBase, RGD), as well as various stock centers. These RRIDs are aggregated and can be searched through SciCrunch.

dustymc commented 5 years ago

need for organism ID

I don't think I see any silver bullets here. We do, or should, already provide everything needed to do good science, and we still get "3 wolves from Arizona" in publications. This might make it easier to find certain information in certain situations or to understand identifiers, but still relies on accurate entry, conscientious users, good loan instructions, etc.

So the definition of an DWC:organism is what we're trying to avoid, and the example of a DWC:organism is what we're trying to move away from - can we use a different term here pretty please?

missing from John's event model is the way to connect the events that are all related to the same organism

That's just another event.

at least visually

Again, that's just UI. I can eg magic IndividualIDs out of relationships, and there's certainly no quantitative limit on that. If the scope of this is "Arctos" then it's a lot of work to do something that does absolutely nothing new for us. It's not really even useful for other Arctos-like systems - we could easily share relationship-grade data. If symbiota-or-whatever publishes enough information to recognize "duplicates", has a place to store a string, and a Curator is willing to make the world a little more awesome, then this is a useful thing. (Perhaps more useful if some sort of 'individual resolver' was built.) If not, maybe it isn't.

Maybe that's just a matter of who owns/issues/manages the IDs. "Relationships lite" should be fairly trivial, and I'll probably do something like this even if it's just to make GBIF et al. slightly less twitchy. That would be no extra work for ya'll, but you'd have to deal with the possibility that one of "my" IDs will be used in a publication. I could use ARKs-or-similar, UC probably won't redirect them to a "swipe your card to see this resource!" site, but they'd still be managed by machine logic and owned by something other than the collection. If I were a Curator I think I'd probably want to bring my own IDs and use them as I please, although I'd certainly use relationships to find things that could use an ID.

Haha, no ARK isn't the video game thing, but I noticed Google REALLY likes that... Lots of orgs issue ARKs, I get mine from https://ezid.cdlib.org/ just because I get DOIs from them - they're like the Wal-Mart of identifiers! Like all identifiers, ARK is as persistent as a curator is willing to make them. ARKs are also as resolvable as anyone is willing to make them. I think they're technically a good approach, but they also don't have anything remotely like the buy-in of DOI.

https://en.wikipedia.org/wiki/Archival_Resource_Key

dustymc commented 5 years ago

@campmlc Do you intend to split the NEON records up or leave them as they are?

If you split, would you use the neon ID as individualID or want to mint something a little more robust? As I mentioned above, the NEON ID has some obvious issues aside from being a random string that anyone could assign to anything for any reason - http://arctos.database.museum/SpecimenResults.cfm?oidtype=NEON%20sample%20ID&oidnum=WOOD.20160928.002A08.V

Either way, if ASU is interested and has the technology to support that interest then it could be a good test of the concept at an above-Arctos scale. I think this is all fairly trivial from Arctos.

campmlc commented 5 years ago

I would rather not split our NEON records, but instead mint the organism ID in such a way that we can eventually link to ASU records, if they can do the same. I don't really want to involve NEON in the discussion at this time, honestly, but would prefer to develop something for us as a model that they can then work with, but which also works for us on a broader scale. If we do it right, our solution should apply to NEON and other scenarios.

The NEON ID is what they created for individual organisms at our request. All their primary data are at the sample level - the NEON sample ID. The NEON sample ID uses site, date, and ear tag numbers. Their NEON "organism" ID is domain (=site) and eartag. Not unique, because eartags can be misread, mistranscribed, and potentially duplicated, and certainly one of those things happened in the example you provide. There are many others. In the case of the example, at least we caught the species discrepancy and there are two separate cataloged organisms. Only one has been given the organism ID, so in this particular case, there is not a problem. I know for a fact there are others that are. But that is what they are using, and that is what we have to go on. We can't fix the errors, but we can make them discoverable.

On Fri, Mar 15, 2019 at 9:19 AM dustymc notifications@github.com wrote:

@campmlc https://github.com/campmlc Do you intend to split the NEON records up or leave them as they are?

If you split, would you use the neon ID as individualID or want to mint something a little more robust? As I mentioned above, the NEON ID has some obvious issues aside from being a random string that anyone could assign to anything for any reason - http://arctos.database.museum/SpecimenResults.cfm?oidtype=NEON%20sample%20ID&oidnum=WOOD.20160928.002A08.V

Either way, if ASU is interested and has the technology to support that interest then it could be a good test of the concept at an above-Arctos scale. I think this is all fairly trivial from Arctos.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1966#issuecomment-473327051, or mute the thread https://github.com/notifications/unsubscribe-auth/AOH0hCb1fZ_B0sTznjdpLGKiyDgwqLvMks5vW7nvgaJpZM4buGmY .

tucotuco commented 5 years ago

Can you explain the tuco model?

An Event-based model would make Events generic and central to the schema. Events would probably track combinations of action/actor/protocol/place/time/result. Right now I think the closest thing in Arctos to how connected Events would be is Agents - they are everywhere, because they are actors in Events.

We don't have the model modeled, but I imagine that it would consist of the same concepts of focal interest that would translate to tables in the database. Many of those are in Arctos already. Cataloged_Items, Collection_Objects, Agents, Projects, Identifications, Localities, Georeferences... Events would relate pairs of these concepts. Organisms could be one of the concepts. Back in the day, Organisms had a table (Biological_Individual) in Arctos, but that table met its demise at some point.

Go here for working definition: https://dwc.tdwg.org/terms/#organismID (I think) although the example is not exactly what I'd hope for (It's an Arctos cataloged item url)

Though you may not want to include all the things that dwc:Organism includes, everything you want to include is a dwc:Organism. The example in Darwin Core can be easily changed without review. If we have a real one that will persist, I can make that change.

What we are missing from John's event model is the way to connect the events that are all related to the same organism.

If Organisms existed, they could be connected to whatever you want. I don't remember all of the connections Biological_Individual had in the early model. An ER diagram of that probably still exists.

tucotuco commented 5 years ago

https://terms.tdwg.org/wiki/dwc:individualID seems closer, but I think maybe it's been deprecated?? https://github.com/tdwg/dwc 404s everything now so ?? @tucotuco

dwc:individualID was deprecated on 2014-10-24, replaced by dwc_organismID (http://rs.tdwg.org/dwc/terms/#dwc:organismID with full canonical definition and history currently at https://github.com/tdwg/dwc/blob/master/vocabulary/term_versions.csv#L44 - this line number could change with subsequent changes to Darwin Core). The pattern http://rs.tdwg.org/dwc/terms/organismID was always a way to resolve to the latest version of a Darwin Core term.

dustymc commented 5 years ago

@campmlc that sounds reasonable to me - if you do split them, all Occurrences will need to carry the "organism ID" anyway. (Although it would then need to resolve to something different.) And the situation with string-IDs is absolutely typical.

Should I grab ARKs pointed to the Arctos GUID for them? I still think it's better if you "own" the IDs, but that's probably not terribly reasonable here.

@tucotuco I think what we're looking for is "biological individual" (whatever we call it) and I'm happy to ignore the 80% or so of life that most of us probably see as "fringe cases" for now. I think I'd be happy enough if "packs" was removed from the example. And maybe clarify colonies - "a cave full of bats" makes me twitchy, "a Portuguese man o' war" doesn't.

Shall we do this with "organism ID"? "DWC:OrganismID"??

Table biological_individual was a child of cataloged_item, so doesn't do what we need here anyway. It, along with "herp" and "mamm" and such, was replaced by Attributes.

campmlc commented 5 years ago

We can go ahead with the ARKs for the NEON IDs as a test. But doesn't GBIF give doi's to occurrences? Maybe we could ask them for a doi for an organism ID? Or do both?

On Fri, Mar 15, 2019 at 10:50 AM dustymc notifications@github.com wrote:

@campmlc https://github.com/campmlc that sounds reasonable to me - if you do split them, all Occurrences will need to carry the "organism ID" anyway. (Although it would then need to resolve to something different.) And the situation with string-IDs is absolutely typical.

Should I grab ARKs pointed to the Arctos GUID for them? I still think it's better if you "own" the IDs, but that's probably not terribly reasonable here.

@tucotuco https://github.com/tucotuco I think what we're looking for is "biological individual" (whatever we call it) and I'm happy to ignore the 80% or so of life that most of us probably see as "fringe cases" for now. I think I'd be happy enough if "packs" was removed from the example. And maybe clarify colonies - "a cave full of bats" makes me twitchy, "a Portuguese man o' war" doesn't.

Shall we do this with "organism ID"? "DWC:OrganismID"??

Table biological_individual was a child of cataloged_item, so doesn't do what we need here anyway. It, along with "herp" and "mamm" and such, was replaced by Attributes.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1966#issuecomment-473361654, or mute the thread https://github.com/notifications/unsubscribe-auth/AOH0hIQ176u5TnVIkYUdjFALW9vSrjBlks5vW89KgaJpZM4buGmY .

dustymc commented 5 years ago

GBIF give doi's to occurrences

Not that I'm aware of - they give DOIs to datasets.

I'm happy to make use of all the DOIs they might be willing to provide!

campmlc commented 5 years ago

Yes, I guess you are right - the doi is just for our dataset?. And they mess it up too. Why is Joe Cook cited? A curator shouldn't be the author.

Cook J (2019). MSB Mammal Collection (Arctos). Version 35.24. Museum of Southwestern Biology. Occurrence dataset https://doi.org/10.15468/oirgxw accessed via GBIF.org on 2019-03-15. https://www.gbif.org/occurrence/1989894063

for

Organism ID http://arctos.database.museum/guid/MSB:Mamm:3061714 occurrences https://www.gbif.org/occurrence/search?dataset_key=b15d4952-7d20-46f1-8a3e-556a512b04c5&organism_id=http%3A~2F~2Farctos.database.museum~2Fguid~2FMSB%3AMamm%3A306171

On Fri, Mar 15, 2019 at 11:14 AM dustymc notifications@github.com wrote:

GBIF give doi's to occurrences

Not that I'm aware of - they give DOIs to datasets.

I'm happy to make use of all the DOIs they might be willing to provide!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1966#issuecomment-473370536, or mute the thread https://github.com/notifications/unsubscribe-auth/AOH0hJv6qZFZ8HxrRHhq7LE5056YkcW8ks5vW9TrgaJpZM4buGmY .

Jegelewicz commented 5 years ago

Yes, I guess you are right - the doi is just for our dataset?. And they mess it up too. Why is Joe Cook cited? A curator shouldn't be the author.

HAHAHAHAHA! I am the author for UTEP's collections....

Jegelewicz commented 5 years ago

As for using NEON - @campmlc do you have one example of something where part of an organism is at MSB and part at some non-Arctos institution that would play nice to see if we could make it all work at the aggregator level?

Jegelewicz commented 5 years ago

Oh damn, we can't even agree on the definition of "organism"....

I like the term biological individual for Arctos purposes and I suggest:

Identifier http://rs.tdwg.org/dwc/terms/Organism
Definition A particular organism identified as a single taxon~or defined group of organisms considered to be taxonomically homogeneous~.
Comments Instances of the dwc:Organism class are intended to facilitate linking ~one or more dwc:Identification instances to one or more~ dwc:Occurrence instances. Therefore, things that are typically assigned scientific names (such as viruses, hybrids, and lichens) ~and aggregates whose occurrences are typically recorded (such as packs, clones, and colonies)~ are included in the scope of this class.
Examples A specific bird. ~A specific wolf pack.~ A specific instance of a bacterial culture.

Shouldn't a group of individual, multi-cellular organisms be something else? Colony perhaps? I realize that there will be coral scientists calling a colony of individual polyps an organism, but somehow, we absolutely need to distinguish between a pack of wolves; wolves and their offspring; and a lone wolf...