ArctosDB / arctos

Arctos is a museum collections management system
https://arctos.database.museum
Apache License 2.0
59 stars 13 forks source link

new agent type / reciprocal relationships / APIs #4722

Closed dustymc closed 2 years ago

dustymc commented 2 years ago

https://github.com/ArctosDB/arctos/issues/4605 (displaying identifier metadata) will be heading to production in a bit, and once that's out I'll be comfortable magicking relationships.

https://github.com/ArctosDB/arctos/issues/4604#issuecomment-1124230155 (accepting data from an outside collaborator) has lead me to the idea that accepting magical assertions should be entirely up to the collections. Here's a rough idea of how we can do that.

  1. New Agent Type. I'll propose "bot" but I'm very open to better terminology. Definition: Agent who writes a specific kind of data to relevant bulkloaders. Should be tightly scoped to specific tasks.
  2. Revised rules around the new agent type. They must be an operator, which means relaxing the rules around agent name type 'login.' They won't be a person (easier to track them), they don't need an Arctos account, should only be created by certain users for specific purposes, etc. The details can be worked out later if there are no giant holes in the general idea.
  3. Tools - the reciprocal relationship task can be repurposed to write data to the identifier bulkloader (as a 'bot' agent).
  4. Documentation - collections need to know these things exist and how to use them. (Short version: grant collection access to whatever bots you want writing data to your collection; don't grant/revoke access if you don't want the offered service.)

I don't have any immediate use beyond the relationship magicker in mind, but it seems like we'll inevitably find something so I'd like to work out a framework that can support any kind of incoming/derived/whatever data.

Better ideas, identifying potential problems, etc., etc. all greatly appreciated.

HELP!

Jegelewicz commented 2 years ago

I'll propose "bot" but I'm very open to better terminology. Definition: Agent who writes a specific kind of data to relevant bulkloaders. Should be tightly scoped to specific tasks.

I think that "bot" has some negative connotations, but "machine" doesn't really cover what's going on. Found these definitions of bot -

Google Results - an autonomous program on the internet or another network that can interact with systems or users. "you can program your bot to store data in the database of your choice"

Merriam Webster - a computer program that performs automatic repetitive tasks : agent sense 5

So I think "bot" is a good choice, but maybe the definition should be

Agent that represents a computer program that performs automatic repetitive tasks and writes a specific kind of data to relevant bulkloaders. Should be tightly scoped to specific tasks.

Jegelewicz commented 2 years ago

should only be created by certain users for specific purposes, etc.

I don't think this will become an issue, but it could - lots of people can add agents. It would only take one disgruntled person to add a "bot" agent that is malicious, but I think the additional step of loading things the bot writes to any component loader mitigates that a bit? At any rate, there probably won't be many of this to start with, so it should be easy to get a list of all the bot agents and make sure we know what they are doing.

Jegelewicz commented 2 years ago

Setting this up and getting Jorrit to use it seems like a really good test case.

dustymc commented 2 years ago

lots of people can add agent

See (2) above - I'd like to limit the creation of these, somehow....

one disgruntled person

... but just to make it easier to keep track of them. The agent itself can't DO anything, and if someone somehow manged to get a script loaded and the bot wired to it, it still couldn't do anything unless someone granted it access to their collection. (So for the sake of paranoia, "check who created the bot you're thinking about letting go in your collection" could be added to the instructions.)

dustymc commented 2 years ago

documentation

Jegelewicz commented 2 years ago

Data will be inserted with status=autoload, and processing will begin immediately.

Wouldn't it be better to let an Arctos operator set it to autoload?

dustymc commented 2 years ago

That would save a grand total of two clicks. Anyone who wants to make those two clicks (and then load whatever they want) can just not give the bot access to their collection(s).

IDK, I keep hearing that it's impossible to deal with this stuff, the only thing I can get to that's simpler than what we have is full (minus turning it on, I suppose) automation.

FYI https://arctos-test.tacc.utexas.edu/guid/KNWR:Ento:7171 is the current frontrunner (I'm manually running the loader because test is in a half-baked state) with 46 bot-created relationships.

Jegelewicz commented 2 years ago

Do we really need all those "collected with" relationships? Shouldn't they all share a collecting event? It doesn't seem that easy to see that fact in the events when I am not logged in....

dustymc commented 2 years ago

Do we really need all those "collected with" relationships?

Whoever added them apparently thought so (but not enough to add all the reciprocals...).

share a collecting event?

Nope. THESE things are whatever "collected with" means despite all having unique events (perhaps cubic-centimeter-at-second precision), THOSE things are not whatever "collected with" means even though they share an event (perhaps of "somewhere in the state last week" precision), etc. Trying to derive relationships from events just doesn't survive reality, even if some TDWG folks wish it did.

see ... events

It is if the events are named, and if they're not then they're ephemeral. That aside, this

Screen Shot 2022-06-02 at 1 11 33 PM

will take you to Events.

Jegelewicz commented 2 years ago

That aside, this

Screen Shot 2022-06-02 at 1 11 33 PM

will take you to Events.

Hmmmm - I'm thinking this isn't very intuitive for Arctos outsiders? Not sure how to make it that way except with text like "see everything collected at this location and time" which seems like too much.

Jegelewicz commented 2 years ago

I think a cool bot would be Bionomia. Could pick up ORCiD and Wikidata identifiers for our agents.... IMG_7155