Closed lin-fred closed 1 year ago
also please list any other agent cleaning steps that need to happen for #4554 to be successful
cleaning steps that need to happen for https://github.com/ArctosDB/arctos/issues/4554 to be successful
You don't need to do anything - say GO! and I go and we're done....
I think people want to do things - lots of cleanup seems to have happened in the other thread (which is what lead to the idea of just getting rid of the clutter that lead to those situations), I'm happy to do whatever I can to facilitate that, just let me know what you need.
Here are 34961 collector (table, not role) or less agents created more than a year ago.
temp_agent_clean_fp(1).csv.zip
There are an additional 6820 low-information agents created within the last year.
FYI there are 95973 total agents at the moment.
Thank you for clarifying that, I thought there had to be something done on our end for you to be able to move forward but now I understand
So our big steps here are, we set up a deadline, once its past the deadline, these agents get moved into verbatim agents and their remarks get moved into a remarks field for the attribute?
It's a long list and there is no way the agents committee can tackle them all, but my hope is that we can help some collections who would really like their agents to be "fixed" before the merge
Would it be better to focus on the old or new ones? Or maybe it doesn't matter?
But we will also communicate to these collections that just because the agent has been put into verbatim agent, nothing is lost and there are tools/workflows to help them clean them up and add them as an actual agent
big steps
Sounds reasonable, or I can break things into chunks, move some out of the way while we're still working on others, WHATEVER.
collections who would...
Let me know if you need a different view of the data.
nothing is lost
Yep, that's the intention.
tools/workflows
Yep. Worst case we do absolutely nothing, which still puts cleanup in the context of more than bare strings and seems like a significant improvement to me. Best case, https://github.com/ArctosDB/arctos/issues/4872 works as hoped, this all becomes a click (which might be automated).
Here are a few for @wellerjes @droberts49
Aileen Alvarez (person: 21302067) [new window] Alec Acevedo (person: 21302055) [new window] Alvin So (person: 21302070) [new window] Anamylee Ruiz (person: 21302050) [new window] Andrew Moy (person: 21302058) [new window] Brianna Nesbeth (person: 21302068) [new window] Daisy Lara (person: 21302063) [new window] Daliana Soto (person: 21302057) [new window] Davarhe Jones (person: 21302060) [new window] David Dietrich (person: 21302047) [new window] Devontea Roy (person: 21302053) [new window] Dixon O'Banion (person: 21302049) [new window] Evelyn Garcia (person: 21302048) [new window] Izaiah Redd (person: 21302056) [new window] James Majors (person: 21302066) [new window] Justin Peterson (person: 21302065) [new window] K'Von Jackson (person: 21302059) [new window] Liliane Tran (person: 21302069) [new window] McClaran Shirley (person: 21302051) [new window] Mustiqirr Muhammad (person: 21302061) [new window] Natalia Carroll (person: 21302064) [new window] Natavia Barr (person: 21302054) [new window] Paloma Carroll (person: 21302062) [new window] Todd Woods (person: 21302052) [new window]
These are all listed as "Student participant in the Chicago Academy of Sciences summer TEENS program." in remarks. If you want to keep them as agents - I suggest creating a project and adding them all to it. I'm happy to do this for you if you want!
Exploring further, there a a whole BUNCH of students in this group that really would be a nice project instead or a series of related projects if this is some kind of annual thing.
@dustymc I guess group membership would be something that keeps an agent an agent? I really don't like the groups, but perhaps they do serve some purpose as they apparently have here.
Groups are just awful (no metadata) relationships, prioritizing https://github.com/ArctosDB/arctos/issues/4555 would simplify things.
I can help a ton with that - Groups are fairly easy to convert to projects - the problem then becomes all the activity of the group and how we manage that. So when the group agent is a collector - OOF
is there a way for me to get a list of all agents that are connected to my collections?
I don't think so, but if you'll elaborate on that I can probably pull them.
any agents that are associated with NMMNH:Bird NMMNH:Ento NMMNH:Herb NMMNH:Herp NMMNH:Inv
?
From the data above (straightforward) or from anywhere (not straightforward, needs an issue)?
From the data above (straightforward) or from anywhere (not straightforward, needs an issue)?
but the data above doesn't say which collections the agents are attributed to, there are a lot of them that have NMMNH in the remarks though that I can work on, but I'm curious if there are many others that don't. I'll make a new issue
See my instructions in your other issue - I've been working on UTEP:Herp agents and already found some cool stuff! Check our Ernest A. Liner (who I also added to Bionomia) and Eugene D. Fleharty. It's fun to figure people out! I also spent some time on random Joneses and was able to use remarks to add other data to some of the - still a long way yo go in that list though....
I'm all for cleaning up agents, but if 'low quality' agents (only initials plus last name) get moved to verbatim, what happens to the collector/preparator agent? I hope it's not getting changed to 'unknown' as someone with a last name but only first/middle initials is certainly known more then someone with only sets of initials. ???
'low quality' is "just names and acting only as collector," the format of the name is not involved.
what happens to the collector/preparator agent?
It will be removed. There is no data loss, the 'verbatim agent' attribute can carry all of the information a names-only Agent can carry.
There will be tools to "upgrade" verbatim if more information becomes available, and the intention of this Issue is to recover anything that was entered incorrectly - eg, https://arctos.database.museum/agent/21345767 (entered today) would have been on the chopping block because it had only remarks, @Jegelewicz created a relationship from those remarks, it's no longer "just strings" and therefore will not be involved in any cleanup.
Lots more discussion in https://github.com/ArctosDB/arctos/issues/4554, https://github.com/ArctosDB/newsletter/issues/166#issuecomment-1211368414 will become an article.
I wasn't able to attend the issues meeting and just read through the notes. Can someone clarify what will happen to agent_remarks for an agent with remarks but no relationships/addresses/transactions? Will remarks get transferred to verbatim agent attribute remarks? (they won't disappear correct?)
Will remarks get transferred to verbatim agent attribute remarks? (they won't disappear correct?)
Yes - the remarks and also I think aka's will be placed in the remark for the verbatim agent.
Are these remarks then visible on the catalog record page - some are loooong and/or pre-date the option to add "curatorial remarks" and probably don't need to be publicly displayed?
some are loooong and/or pre-date the option to add "curatorial remarks"
I would guess that if they are that long we know something about the agent that could fill in a relationship, status or address, which would solidify the agent. @ArctosDB/agents-committee thinks that a review of agents with only name strings and remarks needs to happen before we transition. It would help if individual collections would do this for agents they may be familiar with.
loooong .. remarks
Yep, that's much of the problem - things that could be used to disambiguate agents are in remarks (despite the longstanding documentation) where it's not useful for much of anything. Its hard to imagine remarks could be very verbose and still avoid saying anything useful - maybe sorting by length(remarks)
is a decent place to begin cleanup - so - I added a column to the spreadsheet - the most verbose remark...
Collector of two specimens in L. R. Fletcher collection in DMNS:Inv database. From The Nautilus Vol 79, July 1965. NOTES AND NEWS Ruth E. Coats, 1911-1966 ��� Conchology suffered a severe loss in the passing of Miss Ruth E. Coats. Ruth was born on March 2, 1911, in Seattle, Washington, and passed away on Oct. 19, 1966, in Carlsbad, California. Surviving are her mother, Mrs. Emma Coats of Carlsbad, and two brothers. Ruth received both a Bachelor of Science and a Master of Science degree from the University of Washington. She had a major in zoology and a minor in geology. She taught geology for a number of years at Palomar College. Ruth Coats was the first elected chairman of the American Malacological Union, Pacific Division. Her first meeting, in 1949, was held in the Long Beach Municapal Auditorium, but Ruth was hospitalized and unable to attend. Later she served as Secretary-Treasurer of the A.M.U.P.D. for several years. In 1954, Miss Coats was President of the Conchological Club of Southern California. She conducted a shell study class at the Burch home. The shell house at Carlsbad, California (some would call it a museum) reflected her originality. It was remarkable for its artistic beauty and contained an excellent library of many rare volumes. Around 1950, she bought the famous Raymond Collection. In 1954 she purchased the superb second Belle Whitmore collection. These were added to her large collection made over the years from personal collecting, purchase, and by exchange ��� Rose L. Burch.
...has dates, addresses, and relationships. There's plenty of information, but it's not structured so that it can DO STUFF.
See also https://github.com/ArctosDB/arctos/issues/4922 - we seem to have some sort of systemic problem with this, everywhere I look seems to hold unrelated and therefore inaccessible information, how can we address that?
I checked the CSV I posted, it contains unicode characters (some hyphen-like thing in this example), that seems to have been lost somewhere in the conversion to the google sheets doc, that spreadsheet cannot be repatriated - let me know if you want me to try to make a new one from the CSV @lin-fred
@dustymc go ahead and try to make a new one so that everyone can take a look
I added birth and death dates for Ruth - https://github.com/ArctosDB/arctos/issues/4903#issuecomment-1212175184
I can't figure it out. Google - Google!! - insists on treating csv as us-ascii (or something of the sort).
I opened the CSV in Excel and that seems to have worked, but Excel ALWAYS mangles something that I won't notice for maybe years so I'm not too confident. https://docs.google.com/spreadsheets/d/11Hy5kG7kcrSIBB-VLY9jedRp5jDWGCZlJKnY-jKv0X0/edit?usp=sharing, if you're feeling brave....
@dustymc can you link the csv here that has the added remarks column?
I didn't add it to the CSV, I calculated it in the google sheet - "remlen" in the link above your comment
I don't understand what this means: 'low quality' is "just names and acting only as collector" - can you clarify?
Also, where does verbatim agent show on a detail page? If it is mixed with all the other attributes (age, sex, repro, etc.) vs at the top with a role of collector/preparator, that doesn't make sense to me. If an agent has a role (collector/preparator) and has at least a last name with initials, is that considered low quality - and if so, why?
"just names and acting only as collector" - can you clarify
pkey agent.agent_id
occurs only in tables agent_name
and collector
.
where does verbatim agent show
Very short version is that there's no data which can't function identically as attributes, so the move isn't lossy. Ultimately the hope is that requiring more information will make things like dropping the unique index on agent name more approachable. We can have multiple John Does if they aren't ambiguous (because they're accompanied by things like dates, addresses, and relationships). If a new 'John Doe' doesn't have additional information then it can just be pushed to verbatim agent without losing any functionality (so the most significant roadblock to importing data will no longer exist), and if it does have additional information then that serves to disambiguate it from other agents of the same name so there's no problem.
A slightly different angle is that this change removes the necessity to try to determine how many entities two strings represent. You simply cannot tell if this "John Doe" and that "John Doe" have anything to do with each other, yet up to this point that's a big part of creating collections and dealing with agents in general. That process is frustrating and pointless and results in huge messes, so we've found a way to avoid it.
OK, that makes sense. But what happens if you have a last name with just first/middle initials and a role of collector (or any other role)? Name goes to verbatim, and what about the collector name? If it gets changed to 'unknown' that doesn't make sense because it is known.
I'd just remove the collector-attached agent altogether, but that code hasn't been written so anything is possible. I'm definitely not a fan of leaving "unknown" hanging around when not absolutely necessary.
And for clarity, I don't think the format of the name should have any impact on this. There are probably a million "John Something Doe"s running around out there, none of them make it across the bar by virtue of their name alone. On the other hand "Cher" plus "born 1946" is safely above the bar. The point is to have some sort of secondary disambiguating information, something beyond namestrings.
I agree, we should at least only show 'unknown' when it's truly unknown.
Can you give me an example where there is a collector but it's showing as verbatim agent with 'collector' unknown? The example you gave had unknown role.
I guess this brings up another question. Why not just have 'verbatim agent' and show that with their role (which may be unknown). Why even have separate agents other than verbatim?
Why even have separate agents other than verbatim?
Because we know things about them and we can match them up to other resources that mention them. Your agent page has all kinds of interesting information about you that could help someone decide if the C. Cicero on one of their labels is you.
There's nothing about the string 'C. Cicero' that really tells anyone if it is or is not a representation of you. Until now, a new collection - who's almost certainly looking at isolated agent strings and nothing else - has to make that determination, and they fail in all kinds of ways. Now they don't have to do that, just enter what they've got. Absolute worst case from there involves them eventually figuring it out in the context of the data in Arctos. The goal is to send them some sort of "Hey, C. Cicero sure seems to have a lot of overlap with https://arctos.database.museum/agent/10002371, mash on this button to merge them" notification.
Even the worst case scenario which seems like a huge improvement in a few ways - there's context, it's not blocking import, there's no poor new-to-Arctos person trying to figure some hugely complicated 'node' out, etc.
https://arctos.database.museum/guid/MSB:Mamm:221194 has a 'collector' verbatim if that's what you're asking for.
https://arctos.database.museum/guid/MSB:Mamm:221194 has a 'collector' verbatim if that's what you're asking for.
Nice to see my work in action...
Cleaned up about 40 of these today.
@Jegelewicz regarding the TEENS students -- these were previously associated as a group, if I remember correctly. We have a project set up (https://arctos.database.museum/project/10003328) for the TEENS work with collections. It looks like there is an agent "Teenagers Exploring and Explaining Nature and Science" that is a division of the Chicago Academy of Sciences and many (don't know if its all yet; they aren't in alpha order so it takes a bit more time to check) of the students are connected as associates. I can add in some dates/date ranges. If you can connect them to the project, that'd be great!
Here's new data with some new columns.
Suggest we just imediately delete the 5516 agents who don't have any information and haven't done anything.
Suggest we just imediately delete the 5516 agents who don't have any information and haven't done anything.
Please, no - these are very likely in the UWZM data that is very close to being bulkloaded. I know many of these will end up as verbatim, but I don't want to have to re-do all that work!
Also, I am sorting by remark length and # of collections and will add to the existing Google sheet so that everyone can work on this collaboratively. Please work in the "New list" tab.
Also, I am sorting by remark length and # of collections and will add to the existing Google sheet so that everyone can work on this collaboratively. Please work in the "New list" tab.
Thank you!!!
I also don't think we should get rid of any agents yet. We need to give collections the time to edit their agents, whether they have anything associated to them or not at the moment.
Also, the ones that haven't "done anything" aren't ones that are associated with loans/projects etc, right? As many of those agents won't be associated with individual records per se.
np @Jegelewicz
I just noticed that I didn't change CREATED_BY_AGENT_ID to a name - I can rebuild with that if its useful, let me know.
@lin-fred if they're in this list they've (maybe) collected and that's it. If they have any association with any kind of transaction (or anything else) then they should not be here.
CREATED_BY_AGENT_ID to a name
I don't think so - more important which collection they are associated with.
@lin-fred if they're in this list they've (maybe) collected and that's it. If they have any association with any kind of transaction (or anything else) then they should not be here.
Ok Thank you, I know we keep beating that bush but I just have to double check for my own sanity since I didn't run the query myself.
So in this query, the ones that have 0 records are well and truly just not associated with anything in Arctos, other than possibly who entered them to begin with? And so maybe the person who entered it is important to have?
Also, just putting two FYI's on this topic:
@Vicky Zhuang @.***> is working on an agent clean-up project through a CARES grant (I believe) and will talk about her project more at the October AWG meeting.
MVZ Archives has been cleaning up "mystery" agents (what I call them!) this semester with our archival records migration. I am interviewing more students this week to help with this ongoing work. So we can also help with the review of the agents. There may be lags between agents and their association with a record.
On Thu, Sep 1, 2022 at 3:23 PM Lindsey NMMNHS @.***> wrote:
@lin-fred https://github.com/lin-fred if they're in this list they've (maybe) collected and that's it. If they have any association with any kind of transaction (or anything else) then they should not be here.
Ok Thank you, I know we keep beating that bush but I just have to double check for my own sanity since I didn't run the query myself.
So in this query, the ones that have 0 records are well and truly just not associated with anything in Arctos, other than possibly who entered them to begin with? And so maybe the person who entered it is important to have?
— Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/4903#issuecomment-1234848280, or unsubscribe https://github.com/notifications/unsubscribe-auth/AATH7ULRDNCWBDGQD2DRRLLV4EUGPANCNFSM55TVXMQQ . You are receiving this because you are on a team that was mentioned.Message ID: @.***>
ANNOUNCEMENT
A LOT of agents were marked for merger at the end of last week. I have no idea who did this or why - but I want to be sure these were done by someone with first had knowledge because once these merge, the verbatim agent will be entered as the merged agent if any of these agents lack data. But also - everyone may want to review the list of recently marked for merge agents to make sure they aren't losing something.
@ArctosDB/agents-committee
@Jegelewicz how do you search on these to review them?
Look in your notifications
then filter for agents marked for merge
@dustymc can you post a list of agents that only have remarks you'd like us to start looking through