ArctosDB / arctos

Arctos is a museum collections management system
https://arctos.database.museum
Apache License 2.0
59 stars 13 forks source link

Agent cleanup/tightening default dates #4926

Closed dustymc closed 5 months ago

dustymc commented 1 year ago

Here is an example of some stuff that could be cleaned up

Collector = Joseph William Winthrop Spencer Dates Looks like some stuff was entered with placeholders: image

(Note - the identification dates and specimen event dates are also 1840-01-01!)

Affected records - https://arctos.database.museum/saved/1660169124103

To Do:

  1. Change all collection event begin dates to agent born date
  2. Change all collection event end dates to agent died date
  3. Append to verbatim date - "begin and end dates from collector birth and death dates"
  4. NULL all determination dates
  5. Change specimen event assigned by and date to agent entered by and date

@dustymc this seems like a lot - but it would be great if we could "make it so"!

Originally posted by @Jegelewicz in https://github.com/ArctosDB/arctos/issues/4551#issuecomment-1211337339

dustymc commented 1 year ago

Change all collection event begin dates

I think only for specific values - maybe some are correct, others are some default.

Would be very useful if I could have access (temporary is fine) to this collection - I'm not comfy doing the first of something this complex without access to the Arctos operator UI.

dustymc commented 1 year ago

https://github.com/ArctosDB/arctos/issues/4924 / https://arctos.database.museum/agent/21270137

Jegelewicz commented 1 year ago

Access granted

dustymc commented 1 year ago

@Nicole-Ridgwell-NMMNHS does @Jegelewicz 's recipe above work for https://github.com/ArctosDB/arctos/issues/4924?

The datatypes are consistent, I think that works from this end, if it works for you.

I'm struggling with whether to try to come up with one reusable solution, or if this will need customized for every situation. Input greatly appreciated. (Keeping it simple/consistent will mean I'm a lot less likely to make messes so that's my vote, but I can accommodate WHATEVER.)

The report is https://arctos.database.museum/Reports/cat_record_reports.cfm?report_name=dates:%20collecting%20vs.%20collector, here's who's involved:

@ebraker @Nicole-Ridgwell-NMMNHS @mkoo @AJLinn @campmlc @ccicero @amgunderson @DerekSikes @atrox10 @mvzhuang @cjconroy @jtgiermakowski @wellerjes @gradyjt @jrpletch @AdrienneRaniszewski @aklompma @acdoll @jldunnum @jrdemboski @genevieve-anderegg @msbparasites @mlbowser @jessicatir @ewommack @kderieg322079 @sharpphyl @StefanieBond @catherpes,@catherpes @sjshirar @lin-fred @claypollock @Jegelewicz @droberts49 @zmsch

Jegelewicz commented 1 year ago

Not simple, but some way to review the changes and approve without the need for a bunch of unloading/loading?

Nicole-Ridgwell-NMMNHS commented 1 year ago

@Jegelewicz 's recipe above work for https://github.com/ArctosDB/arctos/issues/4924?

Yes, except I don't think step 4 needs to happen for our data, and I would add an additional step that verbatim date needs to be changed to "no date provided".

dustymc commented 1 year ago

review the changes and approve without the need for a bunch of unloading/loading?

I think that's sorta either-or. I can make relatively straightforward and relatively homogenous updates - I don't think looking at ID determiner (or whatever) on some and not others is much problem - or we can turn this into some sort of more complicated (but more capable) tool (if such doesn't already exist). The former involves maybe answering a few straightforward questions and saying "GO!," the latter would be some flavor of more complicated.

"review and approve" can probably happen to some extent in isolation - I can pull out identifications or something - but I don't think that can be very useful, and reviewing in context would involve something like massively more infrastructure.

AJLinn commented 1 year ago

I know UAM:EH has a lot of cleanup to do with our collecting, creation, use dates. We also have the complication of sometimes having multiple collectors (most of whom won't have dob/dod dates in their agent profiles) when items are passed down through the generations (e.g. https://arctos.database.museum/guid/UAM:EH:UA2015-004-0001). If we know the multiple dates of use/collection by the various previous owners, we could create many use and/or collection dates to correspond with the collectors and the places they used them. This would create a great deal of complication to any automatic changes that a process might implement.

I like the idea of "review and approve" to call attention to the problem, like with agent mergers. As long as we have time to manage the review process when it drops as it might take a great deal of time and research to verify or correct.

In either case, I am in favor of something to improve these problematic data.

ebraker commented 1 year ago

Since UCM records all need slightly different treatments I'm just going to fix these by hand. In general I tried to do this sort of date narrowing when migrating into Arctos, I just wasn't able to catch it all so I think it is a good project.

There are a couple considerations that might be true for other collections:

Can I also request that for ALL UCM collections, when 'identified by' agent = "unknown" we insert the 'collected by' agent name in catalog records? This will update probably 70% of our records. I have always wanted to correct this issue and this action will avoid a lot of the low quality/verbatim agent merging issues that I'm not all the way on board with...

Nicole-Ridgwell-NMMNHS commented 1 year ago

Is this supposed to be some sort of automated, across the board cleanup or will it be a tool we can utilize for individual collections/agents?

Jegelewicz commented 1 year ago

a tool we can utilize for individual collections/agents?

That's what I'm thinking.

dustymc commented 1 year ago

@AJLinn I think that's all expected - the new report format lets me add the flag so we can find things, if some low-hanging fruit can be dealt with then let's do so, if other things take more time then that's OK too - and the flag will hopefully help users understand potential limitations while things are being sorted out.

@ebraker that's mostly expected too, for some situations I may be able to adjust the flag-finding code, for others maybe things are just complicated enough to have to stay flagged.

These aren't rules, they're just indications that maybe something needs more attention.

@Nicole-Ridgwell-NMMNHS this is by request, and I don't think that'll change. MAYBE it'll turn into some UI tool or something, for now I'm just trying to find some way to write mostly-reusable code that I can point wherever you tell me to.

ebraker commented 1 year ago

Thanks @dustymc Should I put in a separate issue for UCM for this?:

Can I also request that for ALL UCM collections, when 'identified by' agent = "unknown" we insert the 'collected by' agent name in catalog records? This will update probably 70% of our records.

kderieg322079 commented 1 year ago

I've just got one, so I'll fix it manually. Most likely just the wrong agent was selected because it is a vague/common name: https://arctos.database.museum/guid/UMNH:Bird:8472

dustymc commented 5 months ago

I don't think this needs to survive https://github.com/ArctosDB/arctos/issues/6813, Agents can be cleaned up as data objects, records will need the involvement of collections (and many agents were created from the very possibly erroneous data in those records, so proceed with caution!). Tentatively closing.