ArctosDB / arctos

Arctos is a museum collections management system
https://arctos.database.museum
Apache License 2.0
59 stars 13 forks source link

Do we need agent guardrails? (was: funky agents) #7649

Open dustymc opened 3 months ago

dustymc commented 3 months ago

Do we need rules or guidance around agents?

I've noticed some not-great agent data being created, I don't know if @ArctosDB/agents-committee would care to attempt to establish any guardrails or if this is fine or ?? Please advise, or close if nobody cares.

Possible Actions

Examples and ponderings and such follow


Here are agents with nonunique preferred name:


select
    agent_id,
    agent_type,
    preferred_agent_name,
    getpreferredagentname(created_by_agent_id) creator,
    created_date
from 
    agent
where
    preferred_agent_name in (select preferred_agent_name from agent group by preferred_agent_name having count(*) > 1)
order by preferred_agent_name,created_date desc;

 agent_id |  agent_type  | preferred_agent_name |         creator          |        created_date        
----------+--------------+----------------------+--------------------------+----------------------------
 21352027 | person       | Allison Nelson       | Jonathan L. Dunnum       | 2024-03-25 08:55:51.725221
 21350942 | person       | Allison Nelson       | Katherine L. Anderson    | 2024-01-31 14:53:50.530417
 21301738 | person       | Ben D. Marks         | Charles M. Dardia        | 2016-06-28 13:32:51
 21248037 | person       | Ben D. Marks         | unknown                  | 2013-12-16 21:49:31
 21351392 | person       | Bruce B. Paige       | Derek S. Sikes           | 2024-03-05 13:38:17.074186
 21348083 | person       | Bruce B. Paige       | Teresa J. Mayfield-Meyer | 2023-04-17 16:58:23.684949
 21351197 | person       | David Johnson        | Derek S. Sikes           | 2024-02-26 19:05:23.736418
 21295057 | person       | David Johnson        | Dusty L. McDonald        | 2015-10-06 11:30:13
 21352097 | organization | DOI Foundation       | C. O. Webb               | 2024-04-03 20:28:58.545203
 21348956 | organization | DOI Foundation       | Dusty L. McDonald        | 2023-06-29 08:16:07.045192
 21351450 | person       | G. S. Tulloch        | Derek S. Sikes           | 2024-03-05 13:38:17.997551
 21351074 | person       | G. S. Tulloch        | Jayce Williamson         | 2024-02-12 13:45:24.589931
 21351805 | person       | Jared Hughey         | Derek S. Sikes           | 2024-03-05 13:38:29.414319
 21348137 | person       | Jared Hughey         | Justin Fulkerson         | 2023-04-21 18:07:15.093075
 21351445 | person       | J. Jacobs            | Derek S. Sikes           | 2024-03-05 13:38:17.907213
 21350919 | person       | J. Jacobs            | Jessica Weller           | 2024-01-27 11:25:38.328081
 21351651 | person       | Laura Lofgren        | Derek S. Sikes           | 2024-03-05 13:38:26.615875
 21349012 | person       | Laura Lofgren        | Jayce Williamson         | 2023-07-16 14:23:47.257662
 21333621 | person       | Lauren Wilson        | Zack Perry               | 2021-07-13 12:39:08.99178
 21300714 | person       | Lauren Wilson        | Erica Krimmel            | 2016-04-12 12:17:17
 21351943 | person       | Mary Ann Sundown     | Angela Linn              | 2024-03-09 20:57:52.329768
 21347114 | person       | Mary Ann Sundown     | Shealyn Golden           | 2023-01-27 02:25:06.394785
 21351324 | person       | R. Leiner            | Derek S. Sikes           | 2024-03-05 13:37:47.64515
 21349050 | person       | R. Leiner            | Jayce Williamson         | 2023-07-26 15:59:10.979065

and recent person-agents - many of which are clearly not persons - without a first or last name:

select
    agent.agent_id,
    preferred_agent_name,
    getpreferredagentname(agent.created_by_agent_id) creator,
    agent.created_date
from 
    agent
    left outer join agent_attribute on agent.agent_id=agent_attribute.agent_id and agent_attribute.attribute_type in ('first name','last name')
where
    agent.agent_type='person' and
    agent_attribute.attribute_id is null 
    and agent.created_date > current_date - interval '1 year' -- remove this line for all, its too much to paste here
order by agent.created_date desc
;
 agent_id |                  preferred_agent_name                  |     creator     |        created_date        
----------+--------------------------------------------------------+-----------------+----------------------------
 21352122 | Jack Spratt                                            | Jozef A. Slowik | 2024-04-05 12:12:44.141425
 21352119 | C. Stillman                                            | Jozef A. Slowik | 2024-04-04 16:05:10.937667
 21352117 | B. S. Blitz                                            | Jozef A. Slowik | 2024-04-04 15:34:50.688057
 21352116 | F. Sorensen                                            | Jozef A. Slowik | 2024-04-04 15:25:19.630444
 21352109 | M. Rosy                                                | Jozef A. Slowik | 2024-04-04 12:59:13.243463
 21352057 | unrecorded                                             | Angela Linn     | 2024-03-26 16:19:24.093752
 21351875 | Bundtzen                                               | Derek S. Sikes  | 2024-03-05 13:38:39.295728
 21351874 | Sid                                                    | Derek S. Sikes  | 2024-03-05 13:38:39.285538
 21351873 | Kenai Veterinary Clinic                                | Derek S. Sikes  | 2024-03-05 13:38:39.275253
 21351872 | Schmidt                                                | Derek S. Sikes  | 2024-03-05 13:38:39.262718
 21351870 | Chester                                                | Derek S. Sikes  | 2024-03-05 13:38:39.226274
 21351818 | Snarski                                                | Derek S. Sikes  | 2024-03-05 13:38:29.660891
 21351809 | Lucas                                                  | Derek S. Sikes  | 2024-03-05 13:38:29.487845
 21351798 | Buck                                                   | Derek S. Sikes  | 2024-03-05 13:38:29.300602
 21351784 | Galena Butterfly Festival Participants                 | Derek S. Sikes  | 2024-03-05 13:38:29.048438
 21351782 | Dick H. Bishop                                         | Derek S. Sikes  | 2024-03-05 13:38:29.023773
 21351746 | Femaida                                                | Derek S. Sikes  | 2024-03-05 13:38:28.354653
 21351744 | Challet                                                | Derek S. Sikes  | 2024-03-05 13:38:28.329927
 21351719 | Southside Animal Hospital                              | Derek S. Sikes  | 2024-03-05 13:38:27.869955
 21351701 | Gwichin Renewable Resources                            | Derek S. Sikes  | 2024-03-05 13:38:27.544315
 21351685 | v. Doesburg                                            | Derek S. Sikes  | 2024-03-05 13:38:27.270269
 21351611 | et al.                                                 | Derek S. Sikes  | 2024-03-05 13:38:25.915341
 21351598 | Sjodin                                                 | Derek S. Sikes  | 2024-03-05 13:38:25.657055
 21351592 | L. Shults                                              | Derek S. Sikes  | 2024-03-05 13:38:20.591105
 21351570 | Schuh & Gray                                           | Derek S. Sikes  | 2024-03-05 13:38:20.233703
 21351546 | R. Latta                                               | Derek S. Sikes  | 2024-03-05 13:38:19.846565
 21351543 | S. Craig                                               | Derek S. Sikes  | 2024-03-05 13:38:19.80164
 21351542 | Bio 116 students                                       | Derek S. Sikes  | 2024-03-05 13:38:19.790886
 21351511 | Waterways Vet Clinic                                   | Derek S. Sikes  | 2024-03-05 13:38:19.29779
 21351485 | Fran & Pete                                            | Derek S. Sikes  | 2024-03-05 13:38:18.724329
 21351479 | Smithhisher                                            | Derek S. Sikes  | 2024-03-05 13:38:18.612474
 21351478 | Unalakleet School students                             | Derek S. Sikes  | 2024-03-05 13:38:18.601517
 21351472 | Christian F. Weisser                                   | Derek S. Sikes  | 2024-03-05 13:38:18.499276
 21351451 | D. A. P.                                               | Derek S. Sikes  | 2024-03-05 13:38:18.021055
 21351438 | Stoneman                                               | Derek S. Sikes  | 2024-03-05 13:38:17.796991
 21351435 | Chriska Derr                                           | Derek S. Sikes  | 2024-03-05 13:38:17.748558
 21351434 | IAS                                                    | Derek S. Sikes  | 2024-03-05 13:38:17.738767
 21351433 | Lehman                                                 | Derek S. Sikes  | 2024-03-05 13:38:17.721695
 21351430 | Pam's pet grooming                                     | Derek S. Sikes  | 2024-03-05 13:38:17.680503
 21351418 | Calkins                                                | Derek S. Sikes  | 2024-03-05 13:38:17.505252
 21351396 | College Village Animal Clinic                          | Derek S. Sikes  | 2024-03-05 13:38:17.138685
 21351393 | Southeast Alaska Animal Medical Center                 | Derek S. Sikes  | 2024-03-05 13:38:17.094364
 21351381 | Roberts                                                | Derek S. Sikes  | 2024-03-05 13:38:16.894943
 21351347 | Stream Ecology Class UAF                               | Derek S. Sikes  | 2024-03-05 13:38:16.3071
 21351345 | Plant Protection Division Ministry of Agriculture USSR | Derek S. Sikes  | 2024-03-05 13:38:16.274119
 21351302 | Taiga                                                  | Derek S. Sikes  | 2024-03-05 13:37:47.281312
 21351297 | DHS                                                    | Derek S. Sikes  | 2024-03-05 13:37:47.19272
 21351294 | Wasilla Veterinary Clinic                              | Derek S. Sikes  | 2024-03-05 13:37:47.138723
 21351282 | McCarthy                                               | Derek S. Sikes  | 2024-03-05 13:34:13.480942
dustymc commented 2 months ago

There is a bug (squashed for next release) on that pathway, but I can't make it skip the duplicate notification. There may also be some complications when the duplicate has no name components, just a preferred name - allowing really low-quality data definitely has some not-great influence on future actions. Anyway, things should be slightly better in the near future, details on whatever lead to any sort of problem are always appreciated, thanks!

need something a little more automated

It was a different system, this isn't a "no," but: way back when we did that, it turned out to mostly be a very useful way to make problems immortal. Somehow I think we'd need a 'careful person saving only the good stuff' filter in there, I'm not sure how we might do that.

Jegelewicz commented 2 months ago

a 'careful person saving only the good stuff' filter in there, I'm not sure how we might do that.

A download of attributes from the bad duplicate that could be uploaded to the good would be better than a ton of copy-pasta.

AJLinn commented 2 months ago

@AJLinn any idea why this happened?

Those were created 2 minutes apart from one another. If I remember right, I might have clicked out of the agent creation pop-up to look at something else relating to the agent name, and the pop-up window closed but must have saved without hitting create agent. So I went back in and created the agent again.

dustymc commented 2 months ago

https://docs.google.com/spreadsheets/d/1sosC-w8xHpyXD0g_x1n2-ub37jEe_CK39cj-mtic9dk/edit#gid=384759468 is a spreadsheet of agents who share first and last name. The temp_agent_share_firstlast_mc tab (multiple character names only) is probably sufficiently overwhelming to decide if we'd like to do anything about any of this or not. There are definitely a few agents I picked up in a quickish skim which suggest https://github.com/ArctosDB/arctos/issues/7649#issuecomment-2102998471 (eg we are failing both operators and contributors with our lack of training-or-something).

One of these has been marked as a duplicate, but not by the creator. Disallowing shared email addresses would address some of this:

https://arctos.database.museum/agent/21352394 https://arctos.database.museum/agent/21352392 https://arctos.database.museum/agent/21352393

Maybe John just really dislikes his last name, but it still lead to a duplicate

https://arctos.database.museum/agent/21352170 https://arctos.database.museum/agent/21352628

??????????? maybe something about the search UI ????????????????

https://arctos.database.museum/agent/21352600 https://arctos.database.museum/agent/21317611

These have the same name, same email address, created by the same person, and both have operator accounts! Even if we do nothing in the name of proper attribution, we should find some way to avoid this situation as a matter of security. And another case where the not-actually-emails are causing tangible problems.

https://arctos.database.museum/agent/21339858 https://arctos.database.museum/agent/21339906

dustymc commented 2 months ago

2 minutes apart

I think that was probably related to https://github.com/ArctosDB/arctos/issues/7738. I just clicked as fast as I can and got the expected warning.

Screenshot 2024-05-14 at 15 34 07
camwebb commented 2 months ago

@Jegelewicz

I think some of these may be your students?

Yes, Elena is working with us at UAM:Herb. She's doing a super job doing detective work on Russian collectors, of which we have hundreds of names with little or no metadata. But some dups may slip through - I'll check.

@dustymc

here are a lot of agents being created with various could-be-important data stuffed into remarks.

Where else should these comments go? If all we know is that a name was a collector in a place at a time there are no additional agent relationships that can be made, but having some info in the remarks can help point another user in the right direction.

dustymc commented 2 months ago

Remarks are expected to contain "I'm not sure how to spell 'pumpkin'" and "agent known to like tatertots" and EVERYTHING else - except the stuff which has typed fields, such as places and dates.

"There, then, doing that" in remarks is useful - to the maybe 1% of people who read remarks and are able to successfully figure out what it means.....

I messed with https://arctos.database.museum/agent/21352177 (and maybe made some bad assumptions, please review). Now that record contains....

Screenshot 2024-05-22 at 07 33 54

Same information (assuming I didn't muck it up or make bad assumptions), organized so that it's MUCH more useful for all sorts of things, including finding those maybe-inevitable duplicates (eg "find ALA agents reported as doing stuff in 1984" just became possible).

camwebb commented 2 months ago

Thanks @dustymc - that makes sense, and of course I agree about entering data into dedicated fields is always better than loose text. But... it's also a leap to say that the correspondence address of a person who collected in Russia is "..., ..., Russia". Maybe a new field 'active in' would be good. Or... better, an auto-generated list of countries and dates of records that the agent is associated with in Arctos.

Back to the larger, old issue: should we make agents at all? I reread the handbook and it's very clear that it's often better not to create agents at all. But... as @DerekSikes and others point out, verbatim agents don't play well with reports, labels. A single agent model (real or verbatim) is just easier to deal with and we're trying to use 'real' agents. That said, not making agents would speed up our transcription process hugely - I reckon ~50% of my own time spent on bulkloading data entered by assistants is spent on reconciling agents - if I pushed everything into verbatim agents it would be a doddle. I recently got Elena to start researching Russian agents, and she's doing great, but we simply don't have much info on the majority of names.

Perhaps a community-wide event is needed, as suggested above?

DerekSikes commented 2 months ago

Maybe a crazy idea but how about: all agents in catalog records are verbatim; all agents in the agent table are real; if they match 100% then a relationship exists, if they do not, then it doesn't (but nothing bad happens, that is, no regulations against having v-agents in catalog record with no real agent matching).

All reports from catalog records would use the verbatim agent field of the catalog record.

dustymc commented 2 months ago

correspondence address

Yea, there's an issue somewhere, I lost the argument that we need some less-addressey-address-thing, feel free to start it again, I'll agree with you!

auto-generated list of countries and dates of records that the agent is associated with in Arctos

That very nearly always has negative value.

https://github.com/ArctosDB/arctos/issues/7796 - pulling live data in whatever form - is of course fine, anyone can go check it in context.

If that's what you're doing then remarks is probably the best mechanism, method would be very useful, and my "interpretation" likely vastly overplays the available hand.

not making agents would speed up our transcription process hugely

My position (which lead to heavy verbatimization, which then somehow lead here) has not changed: Make agents if they DO STUFF for you, don't if they don't. If you know "John Doe" then you're losing nothing by using verbatim, it can easily carry all you have. If you know it's that John Doe then you need an agent-object to carry that information.

match 100%

Multiple people named John Doe have existed.

verbatim agents don't play well with

Not sure I buy that, there were no actionable requests to remap or such.

reports

I'm always happy to help with them, they can use whatever you want them to use.

A single agent model (real or verbatim) is just easier to deal with

That is at least the point that made sense to me, and clearly some agents do have information that verbatim can't carry (shipping addresses, ORCIDs, etc.) so here we are. I'd still use verbatim if I had "verbatim-level" data, but I'm not going to push anyone in that direction very enthusiastically either (pending guidance from The Community here, of course).

~50% of my own time spent on bulkloading data entered by assistants is spent on reconciling agents

And I reckon that's probably not very productive, because you're probably dealing with out-of-context strings. Much of the idea of verbatim was to delay that investigation until AFTER entry, when you have the context to notice (and can request tools to help you notice) the two John Does spend a lot of time in the same places, or have a huge temporal gap, or WHATEVER thing that's not generally available from the string "John Doe." Don't think there's anything hindering that right now, but I'm also not sure what level of resources I could devote to helping.

community-wide event

I'm begging for guidance here, yes please. If we want to set some quality standards then I can probably help with tools, if we don't then I've got plenty of other things to do!

camwebb commented 1 month ago

verbatim agents don't play well with

Not sure I buy that, there were no actionable requests to remap or such.

Is there an existing SQL function (that you made) to concatenate agents and verbatim agents? ... for reports and labels?

Concatenation may be sufficient for many uses, but necessarily has to lose data about the order of collectors. If a specimen had three collectors: A, B and C (in that order) and its record has agents A and C, and verbatim agent B, then there is no way (other than remarks) to indicate the correct order of collectors - any concatenation will give A, C, and B.

I think we'll just push on with creating true agents, trying hard not to create duplicates or assign the wrong agent. It is time-consuming, but should create better overall information.

dustymc commented 1 month ago

SQL function

There's one in https://arctos.database.museum/Reports/reporter.cfm?action=edit&report_id=85, lots of possibilities....

order of collectors

"Bugs Bunny and Elmer Fudd" is a perfectly cromulent verbatim collector....

agents A and C, and verbatim agent B,

If you know A and C then you can probably figure out B (even if it's just that they were some ephemeral being who probably doesn't have field notes), but sure, there are innumerable fringe cases where strings start having trouble carrying the load.

push on with creating true agents

Nobody seems to be suggesting otherwise here, seems reasonable.

should create better overall information.

I don't think that's the trend, but there are definitely defensible reasons to do that so rock on!

dustymc commented 1 month ago
Screenshot 2024-06-10 at 11 21 59

??

Jegelewicz commented 1 month ago

@wellerjes see the comment above. Can you help us figure out why this happens?

wellerjes commented 1 month ago

What I think happened - volunteer could not find "Davidson Brothers Marble Co." because it was not an AKA of the original "Davidson Brothers Marble Company". I was reviewing her work and updated the "Co." to Company, then added the AKA without realizing that there was already an agent named that. I'll remind our volunteers working on agents to try different searches before creating an agent.

I think this is what's happening with duplicate agents--if someone doesn't search with a % or searches "first name+last name" when the agent is only entered as "first name+middle name+last name" (with no AKAs) then they're not finding the correct agent. I've done this before. If the agent's name appears differently throughout the records (J. Weller vs. Jessica Weller vs. J. L. Weller could all be me) it's not always obvious to someone that the agent is the same person, which is why they might ignore the big red box that says "this might be a duplicate"

Jegelewicz commented 1 month ago

I'm just not sure how to change this behavior. If people are only going to search one thing and give up, this will keep happening.

Can we somehow make the search less strict and find near matches?

dustymc commented 1 month ago

somehow make the search less strict and find near matches

There's a whole thread of me saying that would lead here and everyone insisting that they were getting too many matches somewhere....

Jegelewicz commented 1 month ago

There's a whole thread of me saying that would lead here and everyone insisting that they were getting too many matches somewhere....

Also fair because when there are too many, adds just get made. I don't think we can stop humans from being human, we can just keep asking everyone to try harder.

dustymc commented 1 month ago

when there are too many, adds just get made

Yea, there's a whole 'nuther thread of me saying that low-quality data inspires low-quality data....

https://arctos.database.museum/agent.cfm?srch=Davidson%20Brothers%20Marble%20Co.&include_verbatim=false&include_bad_dup=true - "This is the search you're looking for." does not have the problem described, at least as I understand it. I don't know if that's a UI problem (something I might address) or a documentation/training problem (something The Community might address), or something else entirely.

Screenshot 2024-06-12 at 06 51 23

There is some relevant documentation regarding "J. Weller vs. Jessica Weller vs. J. L. Weller":

A generic search, such as only a last name is preferred. This form is searching Agent Preferred Names, so a search for John Smith will not return the agent John H. Smith, but a search for Smith will return both.

https://handbook.arctosdb.org/how_to/How-to-Search-Agents.html

Jegelewicz commented 1 month ago

or a documentation/training problem (something The Community might address)

I think we have addressed it - the question is does anybody read or use documentation?

https://handbook.arctosdb.org/how_to/How-to-Create-Agents.html#before-creating-a-new-agent

dustymc commented 1 month ago

does anybody read or use documentation?

That's the part we haven't addressed, training. Arctos is very hippy-commune-ish about how roles are handed out, maybe we've outgrown that. I'm not sure what exactly the alternative might be, but lots of things require some sort of training/testing/whatever and there must be thousands of models we could explore.

Nicole-Ridgwell-NMMNHS commented 1 month ago

but a search for Smith will return both

A search for Smith will give you "CAUTION: Return limit exceeded, some data may be excluded. Please perform a more specific search to ensure accurate results."

I ran into this the other day searching for my volunteer Judy Miller, searching under Agent name for "Miller" I just about added her again, but was stopped when the agent creator found the agent I was looking for.

dustymc commented 1 month ago

Yea, that's the other juggle-ball: I've got limited resources, I often don't have the capacity to send everything even when you might not get overwhelmed by it. Some of that's potentially fixable - eg do I really need to be including [all of whatever I'm currently including] in the 'anything' search, is there a better sort that might get "us" (unverified us - I'm already sorting by that) closer to the top, etc., etc.?

dustymc commented 3 weeks ago

Duplicates:

 agent_id | agent_type | preferred_agent_name |         creator          |        created_date        
----------+------------+----------------------+--------------------------+----------------------------
 21346039 | person     | David C. Evans       | Joseph Hopkins           | 2022-10-03 08:42:01.420229
 21258378 | person     | David C. Evans       | unknown                  | 2013-12-16 21:49:31
 21334283 | person     | J. O. Sullivan       | Teresa J. Mayfield-Meyer | 2021-09-07 16:29:24.897153
     7604 | person     | J. O'Sullivan        | unknown                  | 2013-12-16 21:49:31
  1017329 | person     | LaRue                | unknown                  | 2013-12-16 21:49:31
 21253481 | person     | La Rue               | unknown                  | 2013-12-16 21:49:31
  1011480 | person     | L. VanHorn           | unknown                  | 2013-12-16 21:49:31
  1010287 | person     | L. Van Horn          | unknown                  | 2013-12-16 21:49:31
 21256873 | person     | Mary O'Donnel        | unknown                  | 2013-12-16 21:49:31
 21253957 | person     | Mary O’Donnel        | unknown                  | 2013-12-16 21:49:31
 21257004 | person     | Röner                | unknown                  | 2013-12-16 21:49:31
 21258795 | person     | Rößner               | unknown                  | 2013-12-16 21:49:31
 21352433 | person     | Tom Rickman          | Derek S. Sikes           | 2024-04-25 12:36:54.346303
 21352432 | person     | Tom Rickman          | Jozef A. Slowik          | 2024-04-25 12:09:25.013787
 21352431 | person     | Tom Rickman          | Jozef A. Slowik          | 2024-04-25 12:06:48.200331

Information only in remarks:

 agent_id | preferred_agent_name |       creator       |        created_date        |                              attribute_value                               
----------+----------------------+---------------------+----------------------------+----------------------------------------------------------------------------
 21352952 | Izak Veals           | Paige Wilson Deibel | 2024-06-26 13:28:50.081707 | student at University of Washington, employee of Burke Museum
 21352951 | Christina Stuhl      | Paige Wilson Deibel | 2024-06-26 13:27:37.568765 | student at University of Washington, volunteer in Burke Museum Paleobotany
 21352950 | Ray Cagnetta         | Paige Wilson Deibel | 2024-06-26 13:25:58.536078 | employee of Burke Museum, museology student at University of Washington
 21352949 | Ana Gutierrez        | Paige Wilson Deibel | 2024-06-26 13:23:06.542398 | volunteer for Burke Museum Paleobotany
 21352948 | Amanda Godfrey       | Paige Wilson Deibel | 2024-06-26 13:22:10.115826 | paleobotany volunteer at Burke Museum
 21352947 | Elena Stiles         | Paige Wilson Deibel | 2024-06-26 12:44:56.034333 | paleobotanist, PhD student at University of Washington
 21352916 | Lulu Gaustad         | Angela Linn         | 2024-06-21 16:41:38.110722 | UAM Ethnology and History
 21352914 | Margen Burke Riley   | Angela Linn         | 2024-06-21 12:38:46.148758 | UAM Ethnology and History
 21352897 | David M. Evans       | Michelle S. Koo     | 2024-06-17 20:56:55.810041 | associated with University of Wyoming in 1970s
 21352897 | David M. Evans       | Michelle S. Koo     | 2024-06-17 20:56:55.810041 | UWYMV collector active in the 1970s
 21352864 | Judith Price         | Mariel L. Campbell  | 2024-06-07 16:36:57.565727 | CMN
Jegelewicz commented 3 weeks ago

Information only in remarks:

https://arctos.database.museum/edit_agent.cfm?agent_id=21352952

image

?

Jegelewicz commented 3 weeks ago

Duplicates

As for

J. O. Sullivan J. O'Sullivan

and don't forget

John O. Sullivan

It is hard for me to say if these are one person, two people, or three without more information from the collections.

It does feel like J. O'Sullivan is just a mis-transcription of John O. Sullivan but I have no definitive proof. The collecting locations differ for J. O. Sullivan and John O. Sullivan so again, I think I would need more information to decide if they are the same person. Maybe @mkoo can figure it out with whatever they have in the MVZ:Arch collection?

DerekSikes commented 3 weeks ago

Re: Tom Rickman - here's what Slowik emailed me back on Apr 25: "Additionally, when I try to enter the collector, Tom Rickman, I tried to create the person as before but it errors out no matter what. "

and "Tom is alive. It's really quirky. If I try to add any additional info on him then it just errors out. If I don't enter any of the fields I get the option to force create and then it errors out. "

and I replied: "Ok, I made an agent record for Tom Rickman. I also was presented with some arctos weirdness and asked to force create which I did and it worked. Arctos agents is being re-tooled so there's all sorts of buggy behavior I hope they iron out fast!"

And then: "Well I got the boxes to turn green but the errors still exist anywhere I put Tom's name. Ideas?

2024-4-25T10:45:30: FAIL: agent_1_name [ Tom Rickman ] is invalid; record_event_determiner [ Tom Rickman ] matches 0 agents; locality_attribute_1_determiner [ Tom Rickman ] matches 0 agents: {"message":"agent_1_name [ Tom Rickman ] is invalid; record_event_determiner [ Tom Rickman ] matches 0 agents; locality_attribute_1_determiner [ Tom Rickman ] matches 0 agents","status":"fail"}

So not user error. Just users trying to get Arctos to behave!

Jegelewicz commented 3 weeks ago

So not user error. Just users trying to get Arctos to behave!

That error exists because there is more than one Tom Rickman and Arctos doesn't know which one to choose.

DerekSikes commented 3 weeks ago

That might explain the later error but not the former before the agent was made (during the process of trying to make the first one)

AJLinn commented 3 weeks ago

Information only in remarks:

Screenshot 2024-06-27 at 6 05 24 PM

Can someone explain to me what this comment is about in terms of the issue of "agent guardrails" - I just added some additional information in both of those records but even prior to that they both had relationships with other established agents.

dustymc commented 3 weeks ago

explain

Bad timing, I was on the wrong server, I fired off the wrong script, my script is broken (please let me know if so)..... who knows, if the good stuff isn't solely in remarks then yay everybody.

And my primary purpose here is still https://github.com/ArctosDB/arctos/issues/7649#issue-2232072678, I'm just gathering some examples, seeing what might be possible, what The Community would like (and if we can figure out how to do that), etc. - if I'm questioning something that you think is OK, PLEASE let me know that too.

Why? https://github.com/ArctosDB/arctos/issues/7894, immediately. There's some data in there that I think possibly shouldn't be loaded (but HOW?), maybe it's fine, maybe my standards are weird, maybe I'm not being paranoid enough, who knows, none of those are decisions that any of us want to make alone, HELP! (And it's all complicated by a bunch of us simultaneously experiencing personal issues, we're not ignoring you @javanveldhuizen!)

Here's fresh data with something in remarks, no relationships, no events, created in the last month.

 agent_id | preferred_agent_name |       creator       |        created_date        |                              attribute_value                               
----------+----------------------+---------------------+----------------------------+----------------------------------------------------------------------------
 21352952 | Izak Veals           | Paige Wilson Deibel | 2024-06-26 13:28:50.081707 | student at University of Washington, employee of Burke Museum
 21352951 | Christina Stuhl      | Paige Wilson Deibel | 2024-06-26 13:27:37.568765 | student at University of Washington, volunteer in Burke Museum Paleobotany
 21352950 | Ray Cagnetta         | Paige Wilson Deibel | 2024-06-26 13:25:58.536078 | employee of Burke Museum, museology student at University of Washington
 21352949 | Ana Gutierrez        | Paige Wilson Deibel | 2024-06-26 13:23:06.542398 | volunteer for Burke Museum Paleobotany
 21352948 | Amanda Godfrey       | Paige Wilson Deibel | 2024-06-26 13:22:10.115826 | paleobotany volunteer at Burke Museum
 21352947 | Elena Stiles         | Paige Wilson Deibel | 2024-06-26 12:44:56.034333 | paleobotanist, PhD student at University of Washington
 21352897 | David M. Evans       | Michelle S. Koo     | 2024-06-17 20:56:55.810041 | associated with University of Wyoming in 1970s
 21352897 | David M. Evans       | Michelle S. Koo     | 2024-06-17 20:56:55.810041 | UWYMV collector active in the 1970s
 21352864 | Judith Price         | Mariel L. Campbell  | 2024-06-07 16:36:57.565727 | CMN
(9 rows)
Jegelewicz commented 3 weeks ago

@dustymc script still not working?

See https://github.com/ArctosDB/arctos/issues/7649#issuecomment-2195180711

Izak Veals definitely has stuff other than remarks.

dustymc commented 3 weeks ago

working

I'm trying to figure out what that means! I added relationships, data below. I think I was trying to avoid derived data, the take-home (if The Community wants to consider this in any way) is that I probably can't exclude low-value relationships. I can't see much way to separate "they actually hang around here, we know this person" and "a vaguely similar name is scribbled on something that once passed through here for some reason." (So https://github.com/ArctosDB/arctos/issues/7649#issuecomment-2163522019 still looks worth investigation.)

 agent_id | preferred_agent_name |      creator       |        created_date        |                attribute_value                 
----------+----------------------+--------------------+----------------------------+------------------------------------------------
 21352897 | David M. Evans       | Michelle S. Koo    | 2024-06-17 20:56:55.810041 | UWYMV collector active in the 1970s
 21352897 | David M. Evans       | Michelle S. Koo    | 2024-06-17 20:56:55.810041 | associated with University of Wyoming in 1970s
 21352864 | Judith Price         | Mariel L. Campbell | 2024-06-07 16:36:57.565727 | CMN
(3 rows)