Closed dustymc closed 1 month ago
grab at least created_by_agent_id and created_date
we should do this for everything, so yes
And there will never be a better time if anyone has other requests, so - other requests?
Free text citations lead to this. I was thinking that perhaps adding something to allow for checks might help, but it's just more work that nobody wants to do or still won't do correctly.
Can we add flags to pubs that appear to be duplicates? Not that anyone would go looking for them, but at least when someone finds one, they might be inclined to clean it up? Although, this can be difficult when cited things aren't in a collection to which you have edit access. Possible to mark pubs for merge?
UGH
Possible to mark pubs for merge?
From a curatorial point of view that sounds doable. No idea on the Arctos side, but it would be pretty easy to see a publication while working with them and go same as that one, merge please.
Yeah, sorry, the stuff in the last couple days is my student's mess. I'll talk with them again about how to check for duplicate pubs first. Good thing is we haven't added the citations yet, so it will be easier to clean up.
created_by_agent_id and created_date
This would greatly aid with checking the work of folks you are trying to train.
Some things that jumped out at me, and that I noticed being created today
the stuff in the last couple days is my student's mess
I take that back, only a few of those are ours.
Free text citations lead to this.
At some level, but I think that's very far from the whole picture. A bunch of these have DOIs, users just aren't searching, or don't know how to search, or are not good at searching, or ??. The tools to avoid this exist, they're just not being used, or not being used in a way that leads somewhere productive, or ????.
Does everyone understand the importance to DOIs (identifiers!) and how to use them to create publications? Logs suggest maybe no, we need - documentation? Training? How-tos? A post-training/pre-access test? Something.....
Some of it's https://handbook.arctosdb.org/documentation/encoding.html, and it may be hard to avoid when pulling from a DOI. You search races of 𝐵𝑢𝑡𝘩
, races of <i>Buth
(and an occasional ü
) exists, you make a duplicate. MAYBE there's something technical in there (https://github.com/ArctosDB/arctos/issues/4783 might help), but I think we mostly need to get people to recognize and avoid that sort of thing.
I don't know how to find duplicates, other than above which is really primitive, misses lots, and finds thousands of potentials (volumes and prolific authors and the occasional homonym) which are fine. Always up for ideas, but I'm not sure machines are going to have much luck with this (except when there's a DOI, those things are made for machines and none of this happens when they exist).
difficult when cited things aren't in a collection to which you have edit access.
Even with access, there's always the chance the suggestion is wrong so I'm very reluctant to just merge (and potentially make bigger messes).
Possible to mark pubs for merge?
At some level museum exist to link material to literature - this is arguably our most sacred task, what I've seen in the logs suggests this is (mostly) easily preventable, we should strive to accept that this is (still mostly) a preventable problem and address that with training. I'd rather The Community build resources which help avoid this, instead of preemptively accepting that we are untrainable and using those resources to perpetually pick up after ourselves. If for some reason we can't do that, I don't think there are technical problems with a merger system but it would be significant development.
For scale, there are 10K pubs, my shady little query found 1200 potential dups, about a dozen of those looked like actual duplicates in the 10 minutes I spent scanning. That would have been a different picture if I wasn't watching the logs or if there wasn't a responsive CM available, but I do and they were and I don't think this is a wide-scale problem. Adding the metadata would help when something takes longer to catch - not a solution to anything, but useful and not much work to add.
I think a simple first step would be to add to the beginning of this: https://handbook.arctosdb.org/how_to/How-to-Create-a-Publication.html with 1) an instruction to search for a pub before creating a new entry and 2) suggestions on how to search (for example, don't search for the whole title, just a few words from the title so that you catch any variations).
A bunch of these have DOIs, users just aren't searching, or don't know how to search, or are not good at searching, or ??
Perhaps updating the project and publications search page would help with searching for publications overall and increasing the chance of finding a pub before mistakenly entering it and creating a duplicate -> ArctosDB/arctos#3736 ArctosDB/arctos#3468
I'm going next task, I know how to do the core of this, if anyone has other publication metadata uses please leave them here.
Is your feature request related to a problem? Please describe.
We don't record who/when created publications. We should.
Describe what you're trying to accomplish
Immediately, find messes.
Ideally I'd like to prevent messes, but I don't know how to do that. (Be much, much more careful about who gets manage_publication access??)
Describe the solution you'd like
Lacking real solutions, grab at least created_by_agent_id and created_date - if there's any pattern, it might involve user-at-time.
And there will never be a better time if anyone has other requests, so - other requests?
Describe alternatives you've considered
Nothing very nice
Additional context
Cleanup is needed. This is certainly not all of the problems, and contains lots of things that aren't problems, but this SQL
was used to make https://docs.google.com/spreadsheets/d/1RQCUSIZ4q4-XEElhMe5SRPgLVrC4wqyyew3tDocDJLQ/edit?usp=sharing
Some things that jumped out at me, and that I noticed being created today:
Priority
I thought high but it seems @Nicole-Ridgwell-NMMNHS has dealt with most of what lead here so ??