Closed campmlc closed 4 months ago
This is acceptable as long as
I am of course happy to help clean up any existing problems which would prevent implementation.
See also https://github.com/ArctosDB/arctos/discussions/5310
This is acceptable as long as
- I can control the value (Arctos GUID) and issuer
YES, agree- I can disallow those things in identifiers of other types YES, agree
From @mkoo in https://github.com/ArctosDB/arctos/issues/6738:
From the AWG discussion:
A new identifier would be created called Arctos record identifier
which would expressly be the full URL of the Arctos record.
The data entry form needs to reflect that users would be able to add the catalog record or DwC triple and the domain etc (https://arctos.database.museum/guid/
) be appended. Although the builder could do that already.
Other suggested UI tweaks-- the Edit form on the record page:
There is also agreement that we would remove the type= "institutional catalog number" and replace with simply "identifier" and the appropriate Issued by for consistent and discoverable other ids.
appended
Yes, I can potentially "I think you mean...." and manipulate the identifier, BUT there's also just about a 100% chance I'll occasionally mess that up. (So perhaps I should throw the 'input' into remarks or something if we get there.) Very strongly suggest we NOT do that, instead embrace https://github.com/ArctosDB/arctos/discussions/5310 (which leaves no room for confusion, doesn't require me to guess what a user might have intended, and doesn't become a liability at the borders of Arctos).
Prefix
Not a good discussion until https://github.com/ArctosDB/arctos/issues/6687 is resolved (prefix may not survive).
remove the type= "institutional catalog number"
For the record: I'm very hesitant about adding more types at all, and my anxiety over introducing yet another type is greatly amplified by the lack of movement on the many existing identifier issues (much of https://github.com/ArctosDB/arctos/issues?q=is%3Aissue+is%3Aopen+identifier+prefix+label%3A%22Priority+-+Wildfire+Potential%22 , but there are still no issues for a bunch of other nonsensical types - eg there are still types for the media/object/device which carries identifiers!!). Clearly much of the confusion leading up to this proposal involves becoming lost in those arbitrary and unnecessary types. Removing what is perhaps the most confusing (and least consistently used) type is a great start, but is there any possible way we can commit to fully normalizing the ecosystem and getting ourselves out of this mudbog as we're adding this?
remove the type= "institutional catalog number"
Can we just stick to this one (very nice) thing and address that elsewhere? I'd hate to see this mired in arguments about other things. Also, I like the idea of type being functional, this could help us as we work through the remaining types.
An addition is the opposite of the simplification this is looking for. I definitely don't want any arguments, but I also think that nearly all of them involve getting lost in the complexity, much of which is brought about by the multitude of unnecessary types. Removing the thing that's clearly confusing users seems in line with the stated goals.
functional
If you mean having rules attached to types and agents, that has always been on offer. (But I think nobody's quite sure what to ask for because of the clutter of so many types, probably complicated by the surprising "what's a GUID?" conversation.) I'd be happy to work up a proposal if anyone's interested, open an issue.
Just a note: most of the usage ( but not all) of institutional catalog number is happening because we lack the clear alternative requested here. Once we have a clear and functional alternative, we can then move towards replacing and fixing the institutional catalog number ids. I absolutely agree with @Jegelewicz that we should not conflate these two issues.
most of the usage ( but not all) of institutional catalog number is happening because we lack the clear alternative requested here
See https://github.com/ArctosDB/arctos/issues/7808#issuecomment-2127875164, this cannot exist as long as those things exist, I can't create this except while also moving them.
I will not support adding more muck in which to get lost. This can and should be a simple matter of sorting identifiers in two ways (here for the resolvable, not-here for the rest). There should be no ambiguity in the data, I don't think I need anything but an OK. (But if this again starts looking realistic I can provide data here for review.)
This affects active data entry protocols across multiple collections in my institution. The only way to accomplish this in a short amount of time is to add the new identifier first so that the correct identifiers can be added and shown to be functional, and then communicate the need to change workflows. This can happen quickly if we do it right now - we have a couple of weeks before the summer cataloging push starts up. Collections need to know that existing data will not be lost from older records. This is the "social" part here - which must be included for this to work. We don't want a repeat of last year. As soon as the new "Arctos record ID" format is up and running, @dusty can convert all existing Arctos guid "identifiers" without problem. The remaining "institutional catalog numbers" can then be prioritized for conversion once we are certain that all existing Arctos relationships have been appropriately captured and converted.
So if I understand @dustymc correctly, we can proceed right away with the resolvable identifiers in Arctos - I agree completely.
https://github.com/ArctosDB/arctos/issues/7808#issuecomment-2128094156 is technically incompatible with what was discussed. The concerns that a new dedicated type might somehow cause data loss are - well, guess I don't have a word, but it's whatever you'd use to describe something that just can't happen. The training and adaptation should be straightforward: use the thing that doesn't produce an error (which hopefully will be self-explanatory once the thing that's obviously be causing arbitrary data is gone).
Now https://github.com/ArctosDB/arctos/issues/7808#issuecomment-2128099917 is making me think I've misunderstood something again.
I need the OK to
We are in agreement on all above, except the last step, which requires a temporal delay of a week or two as collections need to be notified to change workflows, otherwise we have a lot of extremely upset people trying to do things that suddenly cease to work with no notification. This includes dealing with records currently in the bulkloader and in bulkload prep.
Regarding what to call this - see #5310
I support calling the Arctos GUID the full URL. This is also what we are defining the GUID as in the Arctos paper per the AWG discussion 5-24-2024, as the url created based on the Arctos "record identifier". @ccicero
Revised wording: "Each cataloged record has an Arctos Globally Unique Identifier (GUID) that is constructed from the record identifier (e.g., https://arctos.database.museum/guid/APSU:Fish:1079)."
last step ... suddenly cease to work with no notification.
That is precisely my point, but the implementation will not/can not work as I believe you're expecting it to.
Implementing this in the only way it can be done will be a change in workflow, whether we drag some ancillary bits out or not. That is what was agreed to in the meeting and in https://github.com/ArctosDB/arctos/issues/7808#issuecomment-2127896938. Surely the folks entering data aren't THAT difficult to talk to, and we do have a communications team who I'm sure would be willing to help.
Can I request a csv of the existing data in Arctos that use "institutional catalog number"? I don't want to hold this up, but I don't want to be responsible for data loss, and I don't want to presume the rest of the community agrees to conversion of existing data and new workflows without notice.
See https://github.com/ArctosDB/arctos/discussions/5310#discussioncomment-9540549 re Arctos GUID vs record identifier.
The special type would facilitate the correctness of internal links by
All possible?
See first of https://github.com/ArctosDB/arctos/issues/7808#issuecomment-2127950010 re: (3); I'm hesitantly willing to try, but I do suck at reading minds through malformed identifiers and will occasionally (at best!) mangle that. Defensible procedures would involve not making me guess, even if that is implemented. Everything else: Yup, no problem, that's what I said in https://github.com/ArctosDB/arctos/issues/7808#issuecomment-2127875164.
Missing is (5), which is critical to this: Disallowing values that approximate https://arctos.database.museum/guid/
in other types.
Yes,I agree with 5 as well
Those 5 conditions are essential!
If this is to proceed, the first decision will be what we do with the ~15K current identifiers that look like, but are not, valid Arctos GUIDs.
Excluding 'self' relationships from this would exclude most of these, but that seems like a potential trap of some sort.
There might be reasons to allow non-current GUIDs, but then I would lose any ability to exclude random things that people type, and that seems critical to this (especially having now seen the data!).
Much of this is ALMNH changing GUID Prefix (ACK!!), perhaps those could be stripped to triplets without any real loss of persistence.
I'm not sure what to do from here, but I am sure that this type cannot be just another trashcan.
This feels like it's probably going to need some sort of ad-hoc committee, @campmlc perhaps you'd organize something?
Looking over the file, about 10K are ALMNH, another 4K+ are CHAS, and the remaining 1K are miscellaneous collections. I would like to request that we create the new ID type with all the needed constraints so that we can use this for incoming accessions that are already coming in for the summer, and then work to deal with these oddities. Non, ALMNH, non-CHAS: <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">
| -- | -- Row Labels | Count of GUID_PREFIX BYU:Herp | 1 DGR:Mamm | 1 DMNS:Inv | 3 KNWR:Env | 4 MMNH:Edu | 2 MSB:Bird | 10 MSB:Fish | 70 MSB:Herp | 2 MSB:Host | 26 MSB:Mamm | 185 MSB:Para | 217 MVZ:Bird | 3 MVZ:Egg | 11 MVZ:Herp | 4 MVZ:Mamm | 83 MVZObs:Herp | 1 NHSM:Arc | 2 NMMNH:Paleo | 2 NMU:Mamm | 14 OWU:Fish | 4 OWU:Inv | 1 UAM:Art | 4 UAM:Bird | 38 UAM:EH | 15 UAM:Ento | 141 UAM:Herb | 2 UAM:Inv | 2 UAM:Mamm | 133 UCM:Bird | 2 UCM:Herp | 1 UCM:Mamm | 20 UMZM:Bird | 2 UTEP:ES | 1 UTEP:Herb | 1 UTEP:Herp | 2 UWBM:Herp | 2 UWYMV:Egg | 4 UWYMV:Mamm | 2 Grand Total | 1018
Current Status
The core of this is running in test, feedback is welcome.
Definition
Arctos record identifiers or GUIDs when used as identifiers, primarily for the purposes of forming relationships. Only Arctos record identifiers may be used here; Arctos record identifiers may not be used in other identifier types, except Arctos:Entity when used as Organism ID. Automation will correct issued by agent, and will attempt to guess (and leave remarks) if "Triplet" is provided. Value should be added to prefix when available.
In Limbo
Can we eliminate a huge trap,
https://github.com/ArctosDB/arctos/issues/7808#issuecomment-2135899234https://github.com/ArctosDB/arctos/issues/7836?Original Issue
Problem: Need to distinguish and standardize Arctos GUIDs/Urls as distinct "identifier" type
Describe what you're trying to accomplish Make it easier to identify and link to arctos urls in a standardized and internally controlled way
Describe the solution you'd like New ID type:
"Arctos record identifier"Arctos record GUID - The full url of the related Arctos catalog record. Must begin with https://arctos.database.museum/guid/ followed by an Arctos record identifier (the triplet).
The special type would facilitate the correctness of internal links by
Describe alternatives you've considered increasing chaos
Additional context Add any other context or screenshots about the feature request here.
Priority Wildfire