ArctosDB / arctos

Arctos is a museum collections management system
https://arctos.database.museum
60 stars 13 forks source link

Code Table Request: Add KNRC to Other Identifiers #5154

Closed Jegelewicz closed 2 years ago

Jegelewicz commented 2 years ago

Reposting here from https://github.com/ArctosDB/internal/issues/205

How to Use This Form This is a template with examples and guidance on how to best communicate with the Arctos Working Group. Please delete this section along with anything in square brackets [ ] below before submitting.

[ Instructions for reference: ] [ Issue Documentation is http://handbook.arctosdb.org/how_to/How-to-Use-Issues-in-Arctos.html ]

[ Code Table Documentation is https://handbook.arctosdb.org/how_to/How-to-Use-Code-Tables.html ]

[Video Tutorial - Submit a Code Table Request]

Goal Add KNRC as an identifier in Arctos

Context MSB, UAM, and HWML have specimens with this identifier. The UAM records are listed as "other identifier" in various random formatting. HWML and MSB Para need to be able to link parasites and hosts with a consistent and standardized identifier. We have located the source of the ID: KNRC: Koyukuk/Nowitna Refuge Complex http://worldcat.org/identities/lccn-no92004321/

Table https://arctos.database.museum/info/ctDocumentation.cfm?table=ctcoll_other_id_type

Proposed Value KNRC: Koyukuk/Nowitna Refuge Complex

Proposed Definition Identifier for the USFWS Koyukuk/Nowitna Refuge Complex, Alaska. See http://worldcat.org/identities/lccn-no92004321/

Collection type [ If the code table includes a "Collection" column. Ex: Mamm, Herp, ES ]

Attribute data type [ "Attributes" may apply to catalog records, parts, localities, and collecting events. You must specify a datatype (free-text, categorical, or number+units) if this request involves attributes. ]

Attribute value [ For categorical attributes, code table controlling value ]

Attribute units [ For number+units attributes, code table controlling units ]

Available for Public View [ Yes | No to indicate whether the proposed value will be available for public view or available only for logged in operators. ]

Other ID BaseURL [ For OtherIDs, provide the following or explain why the unresolvable ID type is necessary:

"Base URL" with which to prepend entered values, and
A functional example URL, which should consist of the base URL provided in (1) plus a relevant value. ]

ID_References [ If the request involves https://arctos.database.museum/info/ctDocumentation.cfm?table=ctid_references, the changes must be coordinated with the Database Admin team for notifications to function. ]

Priority high. Actively bulkloading data.

Jegelewicz commented 2 years ago

@dustymc posted

The UAM records are listed as "other identifier"

That seems entirely consistent with our many previous documented discussions - https://handbook.arctosdb.org/how_to/How-To-Manage-Code-Table-Requests.html#specific-values-considerations

Would creating a type allow the identifier to DO something; is there some functionality tied to this request? (Without a base_url, that answer is almost always "no.")

Jegelewicz commented 2 years ago

@campmlc posted

Creating a type with standard formatting would allow MSB, HWML, and UAM to link parasite and host record via a standard identifier with associated metadata. In addition, the use of the KNRC acronym explicitly encodes geographic information - e.g. these records are from this refuge system in Alaska. If the goal is to maximize "doing" something such as linking related records to each other and to information content, this is way better than "KNRC12345", "KNRC 12345","KNRC-12345" etc. And now we have Other ID metadata so the identifier comes with other assertions, e.g. I say on X date that this ID is the same as the host at UAM. But I can't do that, or use for example Entity tools, unless the identifier is standardized.

Once the identifier exists, we can reach out to UAM and request they add it to their records.

Jegelewicz commented 2 years ago

@dustymc posted

various random formatting

Not really. I see 6 UAM missing the space, 2 HWML with a space-p suffix for some reason.

standard formatting

Not sure how this is related or where this standardization might come from?

now we have Other ID metadata

And using it to NOT create more nonfunctional identifiers is one of the uses we discussed.

I can't do that, or use for example Entity tools,

I don't know why you'd think that.

request they add it to their record

They've already got it!

Jegelewicz commented 2 years ago

@campmlc posted

I can't use any linking tools if the same ID is written in different formats, as you just acknowledged as being a problem in at least a third of the records involved. What I am proposing would provide a means to record a standardized means of recording the identifier that actually provides information as to what the ID means and where it comes from, rather than a random string. Collections can still use their random string as an other identifier in whatever format is written on the tag. But if they confirm the ID is actually referencing the Koyukuk/Nowitna Refuge Complex, then they can add this identifier as well. We do not currently have Other ID remarks, or perhaps I could add that there, but that will still not solve the problem.

The single biggest problem we face as a community in terms of trackng and linking related records across Arctos collections and across other institutions is the lack of standardized, unique, trackable identifiers. This is not the answer, but it is one step in the right direction towards a solution, and doesn't dig the hole any deeper. Yes, it makes our other ID list longer. But until we implement a means to use the refuge complex as the "author" or "issuer" of the ID, as @Jegelewicz has proposed, I don't see an alternative.

Jegelewicz commented 2 years ago

@dustymc posted

least a third

1500-8!=1/3??

I'm not sure what the rest of that means. I think you are implying that there's some functionality or structure or "difference-ness" connected to ID type that I'm not seeing. If it's there, I need to understand it. If it's not, you need to understand that. Please clarify.

Digging the hole deeper - it's 564 deep at the moment - is precisely what I'm trying to avoid, and a primary use case that was repeatedly discussed in conjunction with ID metadata.

until we implement a means to use the refuge complex as the "author" or "issuer"

I'm just suggesting we do that rather than making a bigger mess.

Jegelewicz commented 2 years ago

@campmlc posted

So are you proposing we move forward with linking identifiers to agents?

"until we implement a means to use the refuge complex as the "author" or "issuer"

I'm just suggesting we do that rather than making a bigger mess."

And yes, it is a third of what I have to upload - most of which have KNRC numbers that are no where else in Arctos and thereby unlinkable currently. I want to standardize so that they can be discoverable in the future - so users don't have to guess whether some uninformative random string in one format is the same as some other random string in a different format. That has been the long term goal of our parasite -host model - being able to find and link formerly unknown relationships based on shared identifiers.

Jegelewicz commented 2 years ago

@dustymc posted

My hesitation on that involved putting a bunch of work into something that'll never get used, which has become common. I think this is the second tangible use case, seems worth reprioritizing to me.

link formerly unknown relationships based on shared identifiers.

There is no amount of metadata that can be added to nonresolvable identifiers that will make that situation much better than it is - it still comes back to hoping Buddy didn't make any mistakes in his notebook and that those perfect data have been faithfully reproduced, all with absolutely no way of checking. I'm relatively sure everyone involved in making and using those numbers has been human, we make the occasional mistake. I can't figure out what you're trying to do here, but I think perhaps the dataype itself is going to make sure you never quite get there. As always, more information about the overall goals would be exceptionally useful in helping find something that's technically capable of accomplishing them.

Jegelewicz commented 2 years ago

@campmlc posted

Context: I am working with parasites collected by MSB:Mamm late 90s through early 2000s that were sent off in a collection of bulk ectoparasites to various researchers for sorting and identification. These are all part of the multi-year, multi-accession Beringian Coevolution Project. They were sent off before MSB:Para existed, before we had a good pathway for linking parasites to hosts. Over the past 10 years, I've been tracking down and linking to the best of my ability to hosts these dispersed legacy parasite lots, all with data that may or may not have been transcribed correctly and consequently, which may or may not be linked to the correct hosts. We've done fleas and then ticks, now lice, and working on the mites from this project. I've also been working on the helminths over the same time frame. The only way we have to find the hosts from these parasites is from the shared field identifier: AF, IF, NK, or in some cases, random things like KNRC. Finding the host data, and downloading the shared collecting event ID, is an arduous process requiring I parse out each identifier, filter all the various formats it could possible be entered in (NK 12345 as other ID?), check to the extent possible other metadata that may support or reject the potential host association. I am currently loading 1800+ lice records, some of which have KNRC numbers. When I check in Arctos for KNRC numbers, I find some at HWML, and some at UAM - none of which correspond to the number series I am working with. So I can't currently match these to host data. However, I can give a locality and date range based on the existing data in Arctos, and the number series involved. This is assuming that all versions of "KNRC" in Arctos are referring to the same source, the Koyukuk/Nowitna Refuge Complex, which from what @gracz-UNL and I have so far investigated, seems to be the case. I would like to add this KNRC:Koyukuk/Nowitna Refuge Complex identifier to my records, and encourage @racz and @amgunderson and @DerekSikes and anyone else who may have now or acquire in the future similar records, so that all this time and effort in validating and tracking down the provenance of this identifier is not wasted, and make it possible to unequivocably link these records to a common identifier in a standardized format issued by the same agent.

And interestingly, if we have a common identifier in Arctos - I could potentially use the entity tool to make these associations for me, instead of spending literally over a week of complete focus time and risk of major error to do the same for 1800 records. The entity would be "Primary Material Sample" - the host as it was collected from nature, and all the associated catalog items would be material samples derived from that material sample. . . @Jegelewicz

Jegelewicz commented 2 years ago

@dustymc posted

may or may not have been transcribed correctly

Yep, that's how nonresolvable identifiers work.

The problem is as expected, but I don't understand the solution. That is, you can

  1. Do something - anything, doesn't matter what, all possibilities have equivalent functionality - with 'KNRC1' and then start over when some new KNRC1 (which may or may not be what you hope) inevitably pops up, or
  2. Create relationships using resolvable identifiers, and maybe note 'inferred from KNRC1' in some remarks so there's some sort of clue if it turns out to be wildly wrong, and never have need to attempt resolving something that's not built to be resolved again.

I don't think any of that's functionally different for any purpose, including entity tools - resolvable identifiers have properties that just can't exist in nonresolvable identifiers.

The entity would be "Primary Material Sample"

I suppose Entities can be whatever you want them to be, but up to this point every discussion has come back to the idea that Entites are best used for things which do not contain anything that might be labeled 'material.'

Jegelewicz commented 2 years ago

@campmlc posted

The second below is what I am trying to do, and the reason for this request. And yes, I have that note added in to remarks.

  1. Create relationships using resolvable identifiers, and maybe note 'inferred from KNRC1' in some remarks so there's some sort of clue if it turns out to be wildly wrong, and never have need to attempt resolving something that's not built to be resolved again.

So can I go ahead and create? I believe I have exhausted every other possible option besides just letting things continue as they are, which is not acceptable. Happy for anyone else trying to do this to join a relevant discussion. But in the meantime, we already have these kinds of identifiers, I'm not asking for anything new, any changes to the model would globally be applied to this as to all other identifiers, and it would mean I can finish what I started here. Once I complete the bulkload, I'm not coming back to these data.

Jegelewicz commented 2 years ago

@dustymc posted

second below is what I am trying

Then why the need for more mess?! One (or both) of us isn't understanding something.

Once I complete the bulkload

Data should make this more clear, care to share?

Jegelewicz commented 2 years ago

@campmlc posted

The requested ID is fundamentally no different from AF, IF, NK, GAN, Fort Bliss Curatorial Facility, etc which already exist as IDs. It is inherently more informative than "other identifier". Implementing so I can complete this task.

Jegelewicz commented 2 years ago

I have zero problem adding this identifier to the code table. https://github.com/orgs/ArctosDB/teams/arctos-code-table-administrators please review the initial proposal.