ArctosDB / arctos

Arctos is a museum collections management system
https://arctos.database.museum
60 stars 13 forks source link

Remove UUID identifiers once they are successfully loaded #6251

Open campmlc opened 1 year ago

campmlc commented 1 year ago

As we are using more and more of the extras tools in data entry, we are going to be adding more and more UUID links to records. These values are internal keys, and provide no information to the public user. Is there a way to optionally mark these to delete on load, or to automatically encumber them from public view?

Alternatives: Have to force all users and public viewers to see the following in catalog records and deal with in identifier downloads:

Screenshot 2023-05-05 10 18 55

Screenshot 2023-05-05 10 10 59

Screenshot 2023-05-05 10 11 18

dustymc commented 1 year ago

more and more of the extras

https://github.com/ArctosDB/arctos/issues/6171 ?????????

Anyway - this is still not technically viable, there's no possible way Arctos can know when no references to an identifier exist (eg from a thumb drive in some drawer).

These values are internal keys

That's one usage.

provide no information to the public user.

There is no such limitation.

Alternatives

https://arctos.database.museum/info/ctDocumentation.cfm?table=ctflag, Identifier: Bulk Unload, and a good procedure would work.

I'm not sure how this could have anything to do with UI?

campmlc commented 1 year ago

It affects UI because all of the screenshots I posted above with these confusing and uninformative UUIDs are taken from the UI. If we can manually bulk unload them, then I would like to request a tool to periodically go in a do this in bulk through the tools menu or through some component loader shortcut. That way I can remove them accession by accession as I confirm that all related content has loaded. What about a notification that provides a premade csv for all uuids that have successfully loaded, so that can be run through the other ID component bulk unloader? That way, I also have a csv download as reference. The other option is to leave them there and encumber them from public view, in the same way we encumber part barcodes and object paths. I would prefer both of these options.

campmlc commented 1 year ago

As far as I understand, not a single person has UUIDs stashed somewhere on some thumb drive. These are generated by Arctos. They are kept internally by Arctos. They mean nothing to anyone and are only intelligible in the context of Arctos. We can't go back from a UUID to something in the component loader, we can't click on them to find out what data they encoded, or who entered them, or when. They are only useful in the context of a csv download from the component loader system that links them to a real Other ID such as an NK. So if, in order to delete them, we have to download the csv from the component loader, or even better, get such a zip file in a notification, then we have a record that might be useful. Otherwise, they clutter up the user interface with viusally confusing, distracting, and meaningless information that has no value for a user or collection manager.

campmlc commented 1 year ago

Some possibilities: 1) Can we have a notification that provides a csv of the UUID info as soon as it is entered into the component loader system? We need this because these are usually entered by students using the data entry form. They are not created via the bulkloader. An operator must review, and this ideally should be done on an accession by accession or student by student basis. 2) Once the UUIDs are loaded via the autoload process or through the component loader, can we get a notification with the relevant csv upload file? That way, we have backup to retrace anything that might have gone wrong. 3) Can this notification also include an optional csv download in the format to unbulkload the UUIDs?

campmlc commented 1 year ago

And if all of the above are not possible, then we need to encumber them from the public interface.

genevieve-anderegg commented 1 year ago

Almost all of our data entry is through single record entry, and UUIDs are created for our records when we have additional identifications (legacy IDs etc.). Once a month I go through, download a CSV, and then bulk unload them from records (after the legacy IDs have been uploaded to the record).

What about a notification that provides a premade csv for all uuids that have successfully loaded, so that can be run through the other ID component bulk unloader? That way, I also have a csv download as reference.

This would be great (or similar system)

mkoo commented 11 months ago

I see the frustration here of having to see an internal key and manually manage them as well (delete them once done).

Another possibility: treat them internal keys so if they are seen it's at the bottom of the page with other autolinks and Arctos reports in maybe an Arctos keys table. @dustymc are UUIDs useful beyond the initial minting? if not, maybe they have a specific lifespan then are deleted?

marking as Needs more discussion

Jegelewicz commented 11 months ago

specific lifespan

the issue there is when someone has something in a component loader that is relying on one of these to match it up with a catalog record and nobody ever returnes to load it. Maybe less of an issue since the bulkloader rebuild? See also https://github.com/ArctosDB/arctos/issues/6398

mkoo commented 11 months ago

specific lifespan

the issue there is when someone has something in a component loader that is relying on one of these to match it up with a catalog record and nobody ever returnes to load it. Maybe less of an issue since the bulkloader rebuild? See also #6398

ok then dont remove them! maybe address by having an option to remove when bulkloading (eg check box for 'remove UUID once completed' sort of thing) ? I think the main thing for UX would be to have them not appear in identifiers which are generally curatorially controlled

dustymc commented 11 months ago

something in a component loader

No, the issue is when someone has something on a thumb drive, or has done one or more of the other near-infinite number of things that might make ANY identifier absolutely critical for some unforeseen purpose. Only the person who's minted/used the ID can know what it might be used for, and even then only in very limited conditions. (And I think maybe this involves some invalid assumption that UUIDs in Arctos are minted by or for Arctos??)

curatorially controlled

EXACTLY! That's UUIDs, they (like all other identifiers) can be nothing else, there are all sorts of tools (not magic) for controlling them. Arctos can provide tools, reminders, lots of things, but it can't (yet?!) scan the universe.

Jegelewicz commented 11 months ago

Only the person who's minted/used the ID can know what it might be used for

but Arctos does mint these to connect "basic" records in the bulkloader to stuff that ends up in the component loaders? The person loading data (or reviewing it) doesn't even know about them until the records go into the bulkloader (and it is always possible that people have removed them there because they didn't know what they were?)

dustymc commented 11 months ago

Arctos does mint these

NO! Arctos is a tool, it responds to a request. A person makes the request (and then does what's expected or not, I really don't have any knowledge or control of that....)

mkoo commented 4 months ago

I am going to jump in and make a suggestion after talking the Teresa and Dusty, have automatically added the IssuedBy = arctosbulkloader (or something similar) then it will be a little more easy to unbulkload via the current tool to do so:

Easy CSV for Identifiers here: https://handbook.arctosdb.org/how_to/How_To_Bulkload_Identifiers_Relationships.html De-identifier tool here: https://arctos.database.museum/loaders/bulkUnLoadIdentifier.cfm

Can this be closed now?

campmlc commented 4 weeks ago

So what was actually completed here? Anything that will resolve the original problem without adding extra work and confusion for operators?

campmlc commented 4 weeks ago

If we see a UUID as an identifier, how will we know that the corresponding data from the component loader has been entered?

campmlc commented 2 weeks ago

Revisiting this issue. I have been working on cleaning up the attribute bulkloader for examined/detected attributes for MSB collections. Files for various users had been sitting there since 2020, and the records would not load for several reasons (including students and staff who no longer were operators at MSB): 1) the controlled vocabulary on all these have changed, and all the entries had to be updated from "endoparasite examination" etc to "examined for" or "not examined for" and a separate attribute for "detected" or "not detected". This was accomplished with some effort. 2) Some attributes were linked to Guid prefix = MSB:Mamm, Other ID =NK, instead of to a GUID, as they were entered at the same time as the related NKs, so no catalog number yet existed, or they were entered prior to the bulkloader upgrade and the Arctos record GUID ID type did not exist, etc. Because NK numbers are assigned to related items, such as parent/offspring, these could not be loaded as is, as Arctos cannot differentiate between multiple records associated by any relationship with the same NK - they gave an error that "cataloged items not found". For any records that had any relationship other than self to an NK, the guids had to be tracked down and evaluated and used to load the attributes. This was also accomplished with some additional effort. 3) Finally, there are attributes that were loaded prior to the data entry upgrade that link to a UUID which does not resolve. See the attached file. For all these, the error is "cataloged item not found". I have no way of knowing what records these link to. I had previously requested that there be some key entered by the user, such as the collector number or NK, that could be added to these pending records, but that never happened. These UUIDs were generated by Arctos for multiple different agents between 2020-2022 using the old single record data entry form. To my knowledge MSB has not been systematically removing UUIDs, despite the above request to do so or at least hide them from public and curatorial view/management. So, what do we do with these? Are these data irretrievable? @AdrienneRaniszewski @jldunnum

examined detected legacy attributes download - failed UUIDs.csv