stop low-information agents, do more with verbatim agents

dustymc commented 2 years ago

Is your feature request related to a problem? Please describe.

We have a lot of low-data agents, they make everything in agent land more difficult than it needs to be.

Describe what you're trying to accomplish

Better data, less work.

Describe the solution you'd like

Policy: don't make low-information agents, use verbatim collector instead. Require some information (address, relationship, status date) for agents that do more than 'collector' stuff.
Clean up existing agents to follow that policy
Tools
- report of verbatim collectors by collection (SQL below)
- report of verbatim collectors from catalog record results
  - tools to do whatever else is missing under this approach

Describe alternatives you've considered

Much work, bad data.

Additional context

First Step: report of low information agents who don't have addresses or relationships and don't extend beyond table collector.

Priority

High, the problem gets worse with every new collection.

EDIT: the promised SQL

select attribute_value, count(*) c from 
cataloged_item
inner join collection on cataloged_item.collection_id=collection.collection_id
inner join attributes on cataloged_item.collection_object_id=attributes.collection_object_id and attribute_type='verbatim collector'
where guid_prefix='CHAS:Mamm'
group by attribute_value order by attribute_value

Just change the CHAS:Mamm of where guid_prefix='CHAS:Mamm' to an approriate value for other collections. Values can be found on https://arctos.database.museum/home.cfm.

dustymc commented 2 years ago

First pass: Attached are 1883 agents who have either one-word or initials preferred names, and who are not found outside of table collector.

Proposal:

add these to each relevant catalog record as
- attribute: verbatim collector
- value: {preferred_agent_name from the attached CSV}
- method: {collector_role}
- remark: {for_attr_remarks from the attached CSV}
Remove them from table collector and delete the agent records

temp_agent_clean_first.csv.zip

I'll proceed (using fresh data) ~~if there are no objections by 2022-04-27.~~ whenever the conversation draws down.

sharpphyl commented 2 years ago

Please retain 21263988 | Sanbornes

dustymc commented 2 years ago

Please retain

If we proceed with this, that would be a matter of data. Maybe we'll be able to see through the clutter enough to build better rules at some point, but for now just about anything would escape the filters I'm working with. Address=South Pacific, alive=1972, WHATEVER. We'd like to have a bar, but at least initially it'll be a very low bar!

Some remark suggests they should be involved in an accession - that would stop this, but hopefully only temporarily.

Agent remarks suggest a name that might lead somewhere and the activity suggests one person, why not just use that and put the uncertainty in the remarks? Maybe we also need some sort of Best Practices document (or the existing cleaned up or added to) - "when given X, we suggest doing Y...."

Unrelated to agents, some other remark makes me suspect this wasn't collected after 1973, and I'm absolutely positive it wasn't collected tomorrow - event dates could be tightened up a LOT (but not as much as they could have been yesterday...).

Jegelewicz commented 2 years ago

We need to make a pass through this because this one

21313587 | á‘á’áƒ | first name=á‘á’áƒ|aka=Kigai; Remark: Ethnology and History verbatim agent; carver

probably needs to be kept as is

AJLinn commented 2 years ago

OBJECTION! Please don't delete anything yet ... but I should be able to get my agents clarified by the time you proceed. That said, as I go thru the list (i'm ever so glad I put my collection in the agent remarks field!) most of my single name individuals fall into one of two categories:

An Indigenous artist (creator) who is known by only one name. Many of these artists made these items prior to converting to Christianity and therefore did not have a "surname" in the way modern people conceive of "proper" name format. (E.g. 21280156 / Qinaqtaq --> an extremely famous Iñupiaq artist who was the creator of baleen baskets in the first decade of the 1900s and is referenced in a number of peer reviewed publications and books). If we proceed with a blanket rule to eliminate these records or move them to verbatim agent we are being prejudiced against cultures who do not follow our same concepts of names, and thus we will cause people to miss discovering objects in our collection. Admittedly, I have been inconsistent with whether I enter their single name as a first name or a last name. I'd be happy to fix those so they are either first or last name consistently.
A manufacturer name that is a single name. Our protocol has been to have the preferred name written as the name physically appears on a manufactured item. (e.g., 21300649 / Spanjian -->a sports uniform manufacturer in the mid-20th century; I just added an aka with the Spanjian Sportswear).

I have argued in the past for both of these types of single named agents to not be deleted or flagged as somehow "less valid" (i.e., moved to verbatim collector) than a record with more than one name. I will fight all night long to defend the single name Indigenous creator record. I will also defend the use of the name that is printed on the label as the preferred name, but will encourage our staff (including myself) to do a better job of finding the full corporate name, if it exists online).

[I'll now get down from my soapbox...]

Jegelewicz commented 2 years ago

@AJLinn brings up a few good points

single term agents of type organization should be allowed without issue
single term agents of type person should be fine IF they include at least one relationship, address, or status (@AJLinn just create a relationship to your organization (associate of) instead of or along with remarks and that should cover it)

dustymc commented 2 years ago

The format of the agent name isn't in any way the problem, it's just a convenient place to start. This should eventually involves ALL agent names; they're still just strings, even (maybe especially!) if there are 17 "words" involved.

I should be able to get my agents clarified

Please let me know if there's any way I can help - pull data out, put it in, WHATEVER. If this comes down to one-by-one it may never get finished. (But it got started and we're thinking about this stuff and that's something!)

in the first decade of the 1900s

Great, add that (or the publications or whatever) and the agent easily clears this bar.

manufacturer

Ditto. (And bigger picture, it seems we're going to be forced past our unique preferred name restriction at some point, which would be a lot more approachable if we could tell the Nike in Oregon from the Nike from Greece.)

somehow "less valid"

See above, these are just a convenient place to start. I can drop this and grab a couple thousand random or something if the format is a distraction.

also defend the use of the name that is printed on the label as the preferred name

That is embedded in the "forced past unique restriction" mentioned above. Doing that and avoiding the absolute most disrespectful thing we could do - not properly attributing work to the creators - is the core of this; right now, if both Nikes show up and (reasonably) demand we use their name, we just can't. If we somehow allow two Nikes, we can't tell them apart (except maybe by digging through remarks, which isn't realistic) which leads to us attributing god-stuff to the shoe-folks. We need more data to move past our restrictions.

full corporate name

Please note that more names won't stop this (or that's how I hope it plays out, anyway). This is fundamentally a request for some sort of actual data beyond strings/names. The ideal form of that is something which leads to a lot more data - a ORCID/WikiData/LoC/whatever address - but the bar isn't that high (yet?? Probably never...) and a vague address (Canada) or status date (alive in 1905) will (we so hope) meet the foreseeable needs.

I was going to refer to documentation - much of the requested information exists, but not in such a way that machines (or humans, unless they're willing to dig) can find it, but the current documentation is not clear. @Jegelewicz the remarks section of https://handbook.arctosdb.org/best_practices/Agents.html#general-recommendations-for-creating-meaningful-agents should look more like https://github.com/ArctosDB/documentation-wiki/blob/ee9493ba951cb64639eb0e97fb51b5e909871c01/_documentation/agent.markdown - "Use remarks as a last resort" is the critical (and now missing) idea.

From the CSV:

Remark: UAM ethnology & history; sports uniform manufacturer in mid-20th century; moved from Pasadena to San Marcos, CA in 1971.

I copied some of that to appropriate places:

And now we have TWO non-name-based data points! There might be another 500 Spanjians out there, maybe even making Sportswear, and as long as they're not operating in San Marcos in 1971 they can't confuse anyone!

Now I'm gonna go file an issue about the values I had to use...

Jegelewicz commented 2 years ago

the remarks section of https://handbook.arctosdb.org/best_practices/Agents.html#general-recommendations-for-creating-meaningful-agents should look more like https://github.com/ArctosDB/documentation-wiki/blob/ee9493ba951cb64639eb0e97fb51b5e909871c01/_documentation/agent.markdown - "Use remarks as a last resort" is the critical (and now missing) idea.

moved remarks stuff to Don't

dustymc commented 2 years ago

21313587 | á‘á’�á�ƒ | first name=á‘á’�á�ƒ|aka=Kigai; Remark: Ethnology and History verbatim agent; carver probably needs to be kept as is

ᑭᒐᐃ is acting as a creator, I think it's safe to assume they were at the creation event which carries places and dates. I don't want to get into some tail wagging the dog situation so I'm (extremely) hesitant to just make those assertions, but I could round them up for human review (and help load anything which passes that).

The other viewpoint is that ᑭᒐᐃ is functionally nothing but a string stored in a complicated way at the moment, changing that to a string stored in a less-complicated structure doesn't change any meaning or function that I can identify. At some point hopefully someone will "elevate" some/many/most "simple string agents" to agent objects (because they want to do something that requires the complexity, not "just because" - I hope), and I'm happy to build tools to facilitate, I just need a use case. (I don't think we're missing any functionality now, but I can probably save some clicking.)

Note also that this approach would unavoidably allow what we're really trying to get rid of. If for some reason someone wants to scrounge up data for T. K. (who seems to be no more than a footnote in an obscure publication), then doing so would put them in the "safe pile" along with any other more-than-strings agent. I'm not sure if that's a feature or a bug, but it's probably unavoidable under this viewpoint.

ebraker commented 2 years ago

@dusty is it possible to get a csv or SQL for UCM records using values from temp_agent_clean_first.csv.zip? That way I can more easily take a pass at reviewing and adding more agent info when possible.

dustymc commented 2 years ago

I did this

select string_agg(guid,',') from (
    select concat(guid_prefix,':',cat_num) as guid from cataloged_item
    inner join collection on cataloged_item.collection_id=collection.collection_id
    inner join collector on cataloged_item.collection_object_id=collector.collection_object_id
    inner join  temp_agent_clean_first on  temp_agent_clean_first.agent_id=collector.agent_id
    where guid_prefix like 'UCM:%'
) x

but the result is a bit awkward to pass around so https://arctos.database.museum/archive/ucm_issue_4554 - let me know if you need something else.

AJLinn commented 2 years ago

"Use remarks as a last resort" is the critical (and now missing) idea.

I actually really disagree with this idea, unless we instead add a free text field called biographical profile or biographical summary. This is essential, useful data that helps distinguish one John Smith from another, it shows up in our agent summary, and is critical for understanding the context of our collections.

Compare our agent record for Robert Bloom to that of the UAF Archives (which is a short one also):

It's easier and more useful than creating a PDF of a biographical profile and attaching it as a media file to the agent record... more clicks and downloads.

We already allowed for markdown formatting for paragraphs of text, so the agent summary page looks better when there's more there.

just create a relationship to your organization (associate of) instead of or along with remarks and that should cover it

I'm not sure this is an appropriate way to "claim" that agent. I'd prefer to add some born/alive/died/dead data, some geographic information in an address field, or additional biographical info if it's able to be located. Sometimes it's an oral history recording or maybe a historical photo in an online digital archive. Would that help fulfill some data points you're looking for @dustymc ?

dustymc commented 2 years ago

essential, useful data that helps distinguish one John Smith from another

For anyone who reads it: sure. A date buried in there is also completely inaccessible to things like https://github.com/ArctosDB/arctos/issues/4551 (and probably most users). The current documentation says "Don’t use remarks when more formal data are possible." which I believe is correct - we do have an appropriate "more formal" field for places (address) and dates (status) so that doesn't belong (or only belong, I don't care what's replicated in remarks to be more readable or etc.) in remarks. We don't have a place for biographical profile so that does belong in remarks. Unless....

add a free text field called biographical profile

New issue, no objection from me (as long as it can be defined in such a way that it's not "remarks when someone felt like using that field").

create a relationship to your organization (associate of)

If they're working for you: Yes, absolutely.

If they tossed a dead rat (or motorcycle or whatever) at you at some point: Nope, over-using relationships will just result in those data not getting cleaned up when we get access to tools (or brains).

born/alive/died/dead data....geographic information in an address field...historical photo i....online digital archive

Any of that will get the agent over the (tentative) current bar. I'd of course like to have all of it and in great detail, but at this point any sort of structured data feels like a great leap forward.

dustymc commented 2 years ago

The conversation seems to have drawn down, OK to proceed per https://github.com/ArctosDB/arctos/issues/4554#issuecomment-1098531767?

AJLinn commented 2 years ago

If by proceed you mean nuking all the one-name agents, I'm still working on my mega-list to add "alive" info and "shipping" address so there are three points of data. Can you give me time to fix them? I can prioritize for the next couple of days.

dustymc commented 2 years ago

No hurry, I just don't want to lose whatever momentum we've got going.

Let me know if I can help with anything.

AJLinn commented 2 years ago

Looks like I have 50 agents to update, which unfortunately I don't think there are any automated wizard things we can do other than looking at their agent activity report and assessing each one individually. We'll see how long it takes!

dustymc commented 2 years ago

See https://github.com/ArctosDB/arctos/issues/4568 - we discussed rebuilding the activity page (somewhere...), let us know what would be useful to surface there.

AJLinn commented 2 years ago

I did it! In this Google Sheet, all the lines highlighted in blue were fixed in some way, mostly adding an alive date and adding a shipping address. Lines left yellow should be moved to verbatim collector as there was not enough information available to justify keeping the agent. Some of the names were corrected, marked as bad duplicates, or had full names found thru an Ancestry.com search. so their agent record no longer exists as on this sheet. Column D indicates what action I took on each agent. Thanks for your patience. Please let me know if you find a problem or need further clarification.

dustymc commented 2 years ago

That's awesome @AJLinn, and I think good evidence that these low-information string-only Agents are in fact leading to problems, both by being confounded with other agents, and by "good" agents being lost in amongst the truly low-information.

Here's another run of the initial query, except I also excluded agents created in the last year.

temp_agent_clean_fp.csv.zip

I'll add this to the AWG Agenda for increased visibility before shuffling anything, and - assuming this is a direction we are comfortable going in - we seem to need two more pieces of best practice documentation:

Some sort of "add more than strings if you wish to avoid verbatimization" point, and
Some sort of "periodic cleanup" section (perhaps even one that eventually leads to full automation)

lin-fred commented 2 years ago

AWG discussed at length, please see notes in the agenda under the Agents Committee section: Create flags when dates don't match (death date is before a collecting date example C. H. Townsend)

Agent Committee looking into starting to clean up agents

mkoo commented 2 years ago

thanks for the bullet points @lin-fred To expound on a few points:

We're not saying we want low-data agents (!); we are saying that we will deal with them in stages as the priority should be to get data into Arctos where we have tools and often the data clues to figure out the bad duplicates, what the initials stand for, who is who, etc . We should use the data-aggregation power of Arctos to help us do the clean up and not introduce too many barriers to adding in agents (and thus records)
We should absolutely add more flags and clean-up tools. We liked the the direction of the new Low Quality data tools for Dups Agents. (we have some new issues for that actually)
We should expand agent name type to create a concatenated name for display. (new issue)
We should develop a best practises for agent names
We should continue seeking an external names authority that we can tag and link identifiers for people (beyond ORCID and wikidata)
We were not enthusiastic about doing more with verbatim agents unless we are missing some critical coolness about them but would rather invest more into agents (where we can link/ merge/ clean-up)
We recognize that this may require a shift and general community agreement on acceptable timeline for cleaning up data. This impacts new collection data migration work (large bulkloading of names) differently than cataloging new records from an established collection (smaller number of new agents) which has most of the agents cleaned-up too.

dustymc commented 2 years ago

We're not saying we want low-data agents (!)

If I'm reading this right you are, whether that's the intention or not!

we will deal with them in stages

I'm just proposing a better separation between those stages, which would let you make the decisions (most of them, anyway) in the context of everything in Arctos rather than the typical maybe-one-collection spreadsheet.

priority should be to get data into Arctos

Agreed - but that doesn't mean we need to make a mess of what should be formal data. I'm proposing to lower that bar, or delay much of the cleanup if you want to look at it that way.

too many barriers to adding in agents (and thus records)

Perhaps this is the point of divergence: Those things are not (entirely) related, you don't need to do anything with maybe-eventually-Agent Collectors to load records, and all evidence suggests you shouldn't bother trying - making that call in the context of other stuff (rather than eg in some spreadsheet) is less work for better data.

not enthusiastic about doing more with verbatim agents unless we are missing some critical coolness about them

I think/hope this is the same as above, the coolness is there and has been for a while, if all you have are strings (eg names) then these are entirely functionally identical and you're just making work (for you and everyone who will need to eventually sort through your low-data messes) with no benefit at all for anyone by forcing them to Agents.

may require a shift

Yes, but it's a simplification.

and general community agreement

I can see no losses, this looks all good/no bad, I'm not sure what would be required beyond the initial agreement to move string-only Agents to a string-only format.

This impacts new collection data migration work (large bulkloading of names)

What I'm proposing does indeed impact that, by mostly eliminating it. 90+% of any new collection is "collectors" (the node, not the role) - those could be removed, and the effort focused on the remaining ~10% (donors and identifiers and such). Still looks all good+no bad from here....

differently than cataloging new records

Perhaps, but I might be able to figure a UI solution out. ("This isn't an agent, {create} or {use attributes}." doesn't seem entirely unapproachable.)

which has most of the agents cleaned-up too.

I'm no longer advocating for "clean" (whatever that might mean), I'm advocating for "carries more information than strings can" as the bar.

I think (hope?!) I'm not being clear on something, so here's the proposal again:

Better documentation; direct users to not create Agents when they're not necessary. ("Necessary" ==> "not functionally identical to the Attribute.")
Periodic cleanup - move anything that can be handled by the Attribute without loss to the Attribute.

That's really it. Do less work when there's no reason to do more, give up nothing in the process. I'm not sure why that's controversial?

If "can be handled by the Attribute" needs elaboration, it's Agents which are referenced only by tables agent_name and collector. Agents which act as identifiers/donors/anything except collector and agents with any status, address, or relationship information would not be affected in any way, other than being surrounded by a lot less clutter (and I can't see any way that won't lead to better data, scroll up for lots of examples even from this tiny initials-only corner of our mess).

Nicole-Ridgwell-NMMNHS commented 2 years ago

It just seems like moving all these agents to verbatim will make cleanup more difficult. Say I have specimens collected by "firstname lastname". I don't know who this person is at the moment. Another collection also has specimens collected by firstname lastname. Their collector is in verbatim because they also don't know anything about them. Their verbatim agent actually collected very similar things to mine, at similar times and in similar places, but I don't know that because it is in verbatim and there is no agent activity report to clue me in. Maybe that link between my collector and their collector could have helped me figure something out about this person. So, I also add firstname lastname as a verbatim agent. Later, someone else comes along and adds some publication by firstname lastname, adding this person as an agent in the process. Great! More is known about this person. But since all our collecting records for firstname lastname are all in verbatim, no one will ever know. Especially if someone somewhere along the way misspelled this person's name as firstmame lastname and nobody realized because the verbatim field has no code table.

dustymc commented 2 years ago

@Nicole-Ridgwell-NMMNHS I don't think you're wrong about any of that, except that it demonstrably results in a whole bunch of variations of 'firstname lastname' that never get reconciled, and when the next person comes along they just throw up their hands and create one more because why not - and the pile gets a little more impenetrable.

I don't think rounding up all 500 firstname lastname variations in verbatim is any more work than in agents. Maybe it's even less work, because some of those agents tend to get misattributed to all sorts of unlikely things where the verbatim are more isolated, IDK.

I'm actually not sure why I'm not suggesting that now, I'll open an Issue.....

dustymc commented 2 years ago

Convenient example, here's what's cooking now. They're largely from the same project, I don't think anybody involved is careless in any way, they're investing a lot more time than most collections would, etc., etc. - I think this is about as good as it gets, and it still results in a lot of duplicates because (a) its mostly just strings, that's always a bit of a guessing game, and (b) there are a LOT of existing strings to sort through.

The proposal that this has turned into would just isolate those string-only data; everything left in Agents would have some other bit of information available, and the strings wouldn't have any possibility of having been confounded with each other or anything else through an erroneous merge or etc.


 getpreferredagentname | agent_relationship | getpreferredagentname  
-----------------------+--------------------+------------------------
 Renn Tumlison         | bad duplicate of   | C. Renn Tumlison
 D. R. Herter          | bad duplicate of   | Dale R. Herter
 J. L. Sands           | bad duplicate of   | James L. Sands
 PREP STAFF            | bad duplicate of   | Prep. Staff
 R. E. Mumford         | bad duplicate of   | Russel E. Mumford
 Allison J. Schultz    | bad duplicate of   | Allison J. Shultz
 E. J. Larrison        | bad duplicate of   | Earl J. Larrison
 G. McLin              | bad duplicate of   | Glen McLin
 C. W. Richmond        | bad duplicate of   | Charles W. Richmond
 J. MacCracken         | bad duplicate of   | J. G. MacCracken
 J. T. Weir            | bad duplicate of   | Jason T. Weir
 M. A. Etnier          | bad duplicate of   | Michael A. Etnier
 C. Hrycko             | bad duplicate of   | Christopher Hrycko
 O. A. Willett         | bad duplicate of   | Ora A. Willett
 Frank Pitelka         | bad duplicate of   | Frank A. Pitelka
 C. G. Rinker          | bad duplicate of   | George C. Rinker
 George Rinker         | bad duplicate of   | George C. Rinker
 W. Wileman            | bad duplicate of   | W. C. Wileman
 Syd Anderson          | bad duplicate of   | Sydney Anderson
 D. E. Metter          | bad duplicate of   | Dean E. Metter
 G. Rinker             | bad duplicate of   | Gary Rinker
 Tovar                 | bad duplicate of   | unknown
 Craig Hilburn         | bad duplicate of   | David Craig Hilburn
 B. T. Ostenson        | bad duplicate of   | Burton T. Ostenson
 J. L. Reid            | bad duplicate of   | Julia L. Reid
 P. Clifton            | bad duplicate of   | Percy L. Clifton
 R. A. Campbell        | bad duplicate of   | Ronald A. Campbell
 V. Shafer             | bad duplicate of   | V. W. Shafer
 Z. Fry                | bad duplicate of   | Zerol Fry
 C. S. Thaeler         | bad duplicate of   | Charles S. Thaeler Jr.
 J. B. Bowles          | bad duplicate of   | John B. Bowles
 J. L. Hayward         | bad duplicate of   | Jim L. Hayward
 K. Estlund            | bad duplicate of   | Kevin Estlund
 J. Gurgel             | bad duplicate of   | Jo Gurgel
 N. Marr               | bad duplicate of   | N. Verne Marr
 S. Farag              | bad duplicate of   | Saleem Farag
 S. L. Lindsay         | bad duplicate of   | Steve L. Lindsay
 Dale Guthrie          | bad duplicate of   | Russell Dale Guthrie
 N. E. Dochuchaev      | bad duplicate of   | Nikolai E. Dokuchaev
(39 rows)

Jegelewicz commented 2 years ago

I am the destroyer of agents. Look upon me and despair.

lin-fred commented 2 years ago

I am trying to summarize all of the concerns/questions surrounding this issue and get it all together as one comment before this weeks issues meeting. I am working out of our last Agent committee google doc, at the very bottom:

Here are my notes now that need some input:

Main issue: stop low-information agents, do more with verbatim agents #4554 Related: get all agent "names" in one place on the catalog record #4869 Code Table Request - verbatim agent #4871 Feature Request - try to match agents to verbatim whateverwecallems #4872 Agent cleanup - agents of type "other agent" #4853 Please add any more related issues here

Notes:

What is the limit/definition that an agent gets put into verbatim. -- no dates, relationships, or addresses? -- Is not connected to a loan/accession -- Only for collectors/preparators? (What about "other agent" types?)
When searching agents on the main search page, agents and verbatim agents will show up in the results
There will be a tool in which you can check the "activity" (something similar to the agent activity page) of verbatim agents so that you can compare them to other verbatim agents (and agents??) -- For use when trying to convert a verbatim agent into an agent -- #4872
Definition of verbatim agent

dustymc commented 2 years ago

"Only use as collectors" is the proposal, but https://github.com/ArctosDB/arctos/issues/4871 does provide a mechanism by which a user/collection could choose to extend beyond that.

The agent_id appearing in loan/accession, addresses, relationships, or any of the other ~30 possible places would exclude the agent from any possible "verbatimizing."

The catalog record search does include verbatim; that's been out for some time.

Agents has 'activity' functionality - also not new.

https://github.com/ArctosDB/arctos/issues/4872 would add something to that - it will somehow try to tell you that 'B. Richards' (verbatim collector) maybe should be merged into https://arctos.database.museum/agent/21271606, but without being annoying when that's already been done. "Some tools exist, anything the data can support is possible" is the intention, we'll probably have to experiment a bit to know exactly what the data can support.

Jegelewicz commented 2 years ago

-- Is not connected to a loan/accession

I would also add publication and project or determiner of some thing (identification, attribute)

Jegelewicz commented 2 years ago

Agents has 'activity' functionality

I believe @lin-fred is looking for this functionality for VERBATIM AGENTS in order to help determine if they are the same as some existing agent.

dustymc commented 2 years ago

add publication and project or determiner of some thing (identification, attribute)

No real objection from me, but any list will be long (and probably incomplete, we change stuff all the time). https://arctos.database.museum/tblbrowse.cfm?tbl=agent (minus collector) is a close approximation.

this functionality for VERBATIM AGENTS

Click one of them - that's "activity." (Other formats/tools/whatever are always possible, but that is the comprehensive summary.)

Jegelewicz commented 2 years ago

Click one of them - that's "activity."

What do I click?

dustymc commented 2 years ago

from agent search. (But I could make those clicky, might be kinda cool....)

Jegelewicz commented 2 years ago

I still don't get it! If I search Dalquest in Agents - I don't see W. W. Dalquest

Is this only from the main search page?

dustymc commented 2 years ago

Jegelewicz commented 2 years ago

AH HA! DOH!

campmlc commented 2 years ago

Make default yes?

Jegelewicz commented 2 years ago

I wouldn't do that - the number of W. W. Dalquests is completely overwhelming....

It really is/should be a toll for agent cleanup. I think intentionally selecting it makes more sense - We just need to document the option.

lin-fred commented 2 years ago

from agent search. (But I could make those clicky, might be kinda cool....)

I think it would help if they are also clicky in the record itself

dustymc commented 2 years ago

AWG: Likes the direction (yay!!).

TODO

communication - don't surprise anyone with this
- need to be clear that not following the guidelines (largely "Don’t use remarks when more formal data are possible....") will result in automated deletion
- clarify what's required to avoid automated deletion (usage outside of table 'collector')
clarify schedule for deletion (proposal: one year after creation)
clean up existing records - save what we can
clarify data entry (esp. new collection) process: don't create Agents if you don't have to, take the easy path, sort it out in the context of records rather than hoping you guess right from strings
continue to build tools to recover verbatim to agents as possible

clicky in the record itself

"Dumb version" in next release, https://github.com/ArctosDB/arctos/issues/4872 might provide a mechanism to do more than string-match, revisit record once that's developed

lin-fred commented 2 years ago

@dustymc what will happen when someone in the future creates an agent that meets this level of low quality. Is there going to be a script that runs every so often that merges it into verbatim?

Jegelewicz commented 2 years ago

Can we disallow that? Any agent MUST have at least one bit of information besides names/akas?

lin-fred commented 2 years ago

Can we disallow that? Any agent MUST have at least one bit of information besides names/akas?

But currently, if you make a singular agent, you can only add their name and a remark on the creation page. And so as it is, no agents would be able to be created. Because we currently create the agent, and then edit to add in all the fluff.

So do we want to change that? But then how many fields do we add to the "create agent" button?

dustymc commented 2 years ago

what will happen

If I get to choose, what @Jegelewicz said sounds great, just outright ban low-information agents. No confusion, no surprise scripts, no complications, no running automation, nobody showing up on my lawn with pitchforks because they didn't read the docs and all their agents vaporized, pleaseplease lets do this...

(We can figure out the UI/details.)

If we can't get that organized, then a monthly-or-whatever purge of year-or-whatever-old low-information agents could work.

lin-fred commented 2 years ago

If I get to choose, what @Jegelewicz said sounds great, just outright ban low-information agents. No confusion, no surprise scripts, no complications, no running automation, nobody showing up on my lawn with pitchforks because they didn't read the docs and all their agents vaporized, pleaseplease lets do this...

(We can figure out the UI/details.)

If we can't get that organized, then a monthly-or-whatever purge of year-or-whatever-old low-information agents could work.

ok cool

mkoo commented 2 years ago

If I get to choose, what @Jegelewicz said sounds great, just outright ban low-information agents. No confusion, no surprise scripts, no complications, no running automation, nobody showing up on my lawn with pitchforks because they didn't read the docs and all their agents vaporized, pleaseplease lets do this... (We can figure out the UI/details.) If we can't get that organized, then a monthly-or-whatever purge of year-or-whatever-old low-information agents could work.

ok cool

Minimum requirements for agents is fine as long as we show a clear method to use verbatim and a path to converting verbatim to full "agenthood" as data becomes available. I think the latter will be key to keeping pitchforks off your lawn

lin-fred commented 2 years ago

@dustymc can you link me the table of agent types that have the possibility of being merged into verbatim agent? You said it was the collector table and not just collectors?

Sorry if you already posted this in the thread.

Jegelewicz commented 2 years ago

https://arctos.database.museum/info/ctDocumentation.cfm?table=ctcollector_role

lin-fred commented 2 years ago

so what happens with the collector_role values when they merge into verbatim agent?

ArctosDB / arctos

stop low-information agents, do more with verbatim agents #4554