ArctosDB / arctos

Arctos is a museum collections management system
https://arctos.database.museum
60 stars 13 forks source link

Request - Allow collector & preparator numbers to be issued by non-person agents #8129

Open adhornsby opened 1 week ago

adhornsby commented 1 week ago

Help us understand your request (check below):

Describe what you're trying to do

I'm getting the following status error in browse & edit:

identifier_1: collector number and preparator number may only be issued by person agents

I know I'm getting this error because I'm trying to enter a non-person agent (United States Geological Survey) as the issuer of a collector number. I also think I understand why this change was made -- because historically collector & preparator catalogs have been for individuals and not groups/institutions/agencies, and the code table definitions reflect that. What I don't understand is why this needs to be a system-wide restriction instead of a best practice that collections can choose to adopt.

E.g., Bell Museum has a decades-long practice of shared prep catalogs, with what are called preparator numbers, which are most accurately in our eyes issued by the collection -- not by the preparator, not by unknown or null, etc. This restriction will force us to enter data inaccurately/incompletely, and will make specimens less discoverable when we want to search for prep numbers issued under the shared catalogs. I'm happy to give other examples where this will cause frustration, if needed.

mkoo commented 6 days ago

We too have shared prep catalogs and find them much faster now: Ex. https://arctos.database.museum/guid/MVZ:Bird:183123 and lets us track more than one IssuedBy agent: https://arctos.database.museum/guid/MVZ:Mamm:245512

For us, MVZ Prep Lab Catalog is pretty explicitly preparator numbers so it feels more accurate and easy to find. We have an aka "PLC" so that's usually what I use to look up a number. Since it's shared we often include the preparator as one of the agents.

dustymc commented 6 days ago

Yes, an example would be useful.

The restriction comes from the definitions, eg

A value that refers to a person's field catalog.

If that's not accurate it could be updated (as long as we can do so in a way that doesn't mangle any existing data).

If collector number and identifier ("proper for a wide range of identifiers...") can't be defined without overlap - if the choice of which to use is arbitrary - then they can only serve to hide data and should be merged/removed

United States Geological Survey

Unless these are something different than eg the identifiers issued by USGS in eg https://github.com/ArctosDB/arctos/issues/8095 (in which case they should probably be issued by a different agent), then what I think you are trying to do would make these data very difficult to find.

Jegelewicz commented 4 days ago

When was this restriction applied? Was the decision made by the community and communicated to everyone before it's application? If it was, the communication is not reaching people as @jebrad has the same issue with some collector numbers.

Could we get data on how many collector and preparator numbers currently in Arctos are issued by an agent of type organization?

adhornsby commented 3 days ago

I see how having MVZ Prep Lab Catalog as an agent makes it easier to search that catalog. But those prep numbers are now less functional than if they were entered as identifier type = preparator number, since identifier type = identifier isn’t available CLEANLY in flat/cache (right?). It would take a lot more work to have to dig them out of the “other identifiers” strings when we want to use them -- which we do for ~every download and report that we do, across MMNH and UMZM. These are the numbers we rely on for tracking things before they’re cataloged (and in some cases even after they’re cataloged), so it’s hard to overstate how much frustration will be caused if we have to bury them in “other identifiers” strings.

Since collector and preparator number series are sometimes shared (I think everyone agrees on that point?), we should just change the definitions to match reality. They can even reflect the historical definitions of those terms, e.g.:

This seems like it would alleviate all issues with allowing non-person agents to assign these types of numbers when needed. (I think we can clean up the definition of identifier type = identifier, too, I just don’t want to distract from the issue I raised first.)

If you need some other examples: We often receive accessions from people/groups that don’t use the same language as we do, and I have to decide how to enter the identifiers they supply. E.g. Montana Natural Heritage Program has given UMZM a lot of specimens with what they call voucher numbers, which I interpret as collector number since it was the number assigned at collection and came from their shared field catalog, and I want to use it exactly as I would any other collector number. Another e.g., I’m working on an accession at MMNH from a lab group where they assigned “sample IDs,” which again were the numbers assigned at field collection, based on the shared lab catalog, and which I would want to use as any other collector number.

dustymc commented 3 days ago

identifier isn’t available CLEANLY in flat/cache (right?).

Nobody's asked for it to be! (And I don't think that would be useful, but an agent-based column might be, or we might do something dynamic, or ????????? - I need to know what you want to DO before I can write code that DOES it!)

bury

Cleanup should be everybody's priority. I'm sorta-sure very few understand what we're trying to do with the agent-centric model, anything that's buried is only buried because something is being mis-used (in part because I think everyone gets lost in the sea of nonsense types that we can't get rid of) or is stuck behind some weird objection or etc.

clean up the definition of identifier type = identifier, too

I can't see how your proposed definition of collector number doesn't fully and arbitrarily overlap, so that'd have to be simultaneous if they were both to survive such a change.

examples

Those are all easy: use type identifier and an appropriate agent, and let me know exactly what you'd like to do from there.

adhornsby commented 3 days ago

For the moment all I’m asking is that we please re-allow non-person agents to issue collector and preparator numbers. I don’t see what harm was happening to cause this change, other than conflict with the definitions (which I argue don’t reflect reality and should be updated).

RE: identifier type = identifier, I can only speak to my own experience which is to use that as a catch-all for “identifiers that don’t classify as another existing type” – so, that is the definition that would separate it from other types. It’s unsatisfying but I think necessary, since we’ll always need a generic type for unknown stuff (could be “identifier” or “other identifier” or “unclassified identifier” or whatever). Since I don’t know what those numbers are and don’t rely on them for anything, it’s never been a problem for me to have them buried in the “other identifiers” string.

adhornsby commented 3 days ago

Those are all easy: use type identifier and an appropriate agent, and let me know exactly what you'd like to do from there.

Maybe easy for searches, but a big headache when trying to make generalizable reports. E.g. I want to print preparatornumber on every tag, and it makes no functional difference at all whether it's a prep number issued by an individual or group. If those are all stored in preparatornumber, then I can search & report for any sort of mixed batches/accessions that I need to label. If I have to call them up in the report code as generic identifiers with particular issuing agents, then I’d have to run separate search & report for each batch of things that had different issuing agents, right? It’s feels like a lot of unnecessary complication when we can just update the definitions of collector & preparator numbers to reflect reality.

dustymc commented 2 days ago

Adding this to AWG Agenda (we should have discussed it today!).

no functional difference

There are huge functional implications to this. Eg MSB seems to want to get USGS data out regularly, they can't do that if they're under a bunch of types and formats, and I can't control types and formats if we're doing arbitrary things with the data. (And this feels very arbitrary, but maybe I'm just not getting something? Why would you call something a preparatornumber instead of something else? Maybe that's still not the right question, but I think it's where this starts seeming very strange to me.) Any sort of denormalization - allowing one type of thing to be stored in multiple ways/places/forms - tends to have far-reaching impacts influencing what the data can do.

So: Can this be shaped as a functional requirements discussion? Possibly I've just got strange ideas about what we're trying to DO with these data.