data fields for agent names

heathercole commented 3 years ago

GIVEN I have accessed DINA as data manager

WHEN i add agents such as collectors or determiners to the data module

THEN the system should support structured fields for firstname/lastname/initials (other?)

AND they should be queryable, support creating groups, etc.

dshorthouse commented 3 years ago

firstname/lastname/initials is culturally insensitive, which might force a user to create a lastname when there isn't one.

This ticket is also conflating the identity vs. string representation of a name. It is the same as the dialog on #65 and indeed will be the same with any instance where an agent comes into play or is used. If the Agent module is the home where unique identity is assumed (i.e. where there are no duplicate entries for what might later be discovered to be the same person), then we still require a home for the verbatim entries, the "version of record" as it were for the collector or determiner as documented on the specimen label or an annotation. The reason for this is because the display of an agent string (or composite of agent strings) is only loosely tied to identity. Display of an agent string requires context. And, a declaration of identity for the person behind the agent string is an assertion that may in fact require additional evidence. Picking a name from the Agent module, though seemingly innocuous when the database is still quite sparse, while entering data in the context of the item at hand, actually requires cognitive effort and considerable knowledge. eg I picked "Michelle Smith" the person from the Agent module because I know that the bare "Smith" on the label is that person AND I want to ensure that only "Smith" is recorded in this context because that is what is written on the label. Where and how "Michelle Smith" is represented might be different for other data items.

So...

Yes, it's fine to add additional (yet optional) fields to the Agent module insofar as it helps with search & disambiguation. But, the entry of data in that data item for the agent string must not be drawn from the Agent module. That is, linking an agent to a data item is a secondary process. There must still be a verbatim field for the agent string in any place where people names are written such that a user can later verify that the link between that data item and the Person in the Agent module is correct.

This might seem academic so I'll give an example: Mrs. Jack Smith as written on a label. Unfashionable today but common not long ago. (1) That's the version of record & needs a home in its context, (2) We know this is not Jack Smith (and we'd be erasing her contribution if we linked this to an Agent entry Jack Smith). (3) We later discover what is her name and then need a home for it. So where do we put Mrs. Jack Smith in the interim? It cannot go in the Agent module because that's for people. We do not yet know who is this person.

In practise, this ticket needs to be broken-up into several sub-tickets so we do not risk baking an undercooked (or overcooked) pie.

Agents need more (optional) fields like firstname, lastname, initials, birth date (w/ precision), and death date (w/ precision), honorific, salutation, suffix, etc.
Each module where an agent string needs entry must have a verbatim field
Where ever that verbatim field for an agent string is present, it must also have a secondary option for an unequivocal link to an Agent entry where the identity of that person is stored. Making such a link is the user's declaration that, "this is the Person I mean" even while the agent string might be wildly different from whatever is used to display an Agent entry

dshorthouse commented 3 years ago

Possible fields for Agents that relate to the namestring itself:

given name(s): first & multiple "middle" names included here
family name(s): compound family names included here
title (eg Dr., Prof.)
appellation (eg Mr., Mrs., Ms.)
particle (eg de, di, van, von)
suffix (eg Jr., Sr., III)

Possible fields for Agents that relate to some demographics of the Agent to help with disambiguation:

birth date (with year, month, day precision for storage and presentation)
death date (with year, month, day precision for storage and presentation)
taxa of specialty (free-form list)
country

Other possible (& strongly recommended) fields for Agents:

external identifiers for display and resolution

And finally many aka/aliases (could be stored as arrays with a language key):

language
aliases

Possible example for storage (relevant to #97): { en: [ "Mrs. Jack Smith", "Donna Smith", "D. Smith" ]}

jmacklin commented 3 years ago

I thought a link to the Harvard Index of Botanists may be valuable here as food for thought. If you do a search on Macklin, you will see that there are a few of us but in the search you can discover me by several different representations (variant names or aliases) of my name and by my date of birth, etc. I also have a unique identifier. There is also a concept of collector and author teams. These also have unique identifiers and variant forms. If you want to see groups/team craziness, follow the link below to my colleague David Boufford!

https://kiki.huh.harvard.edu/databases/botanist_search.php?start=1&name=Macklin&id=James+Macklin&remarks=&specialty=&country=&individual=on

David Boufford: https://kiki.huh.harvard.edu/databases/botanist_search.php?mode=details&id=33

dshorthouse commented 3 years ago

Thanks, @jmacklin. What I need convincing of is the requirement for storage and maintenance of teams/groups & all the concomitant problems of then requiring that the assemblages have static titles, ordering, mechanisms to display them in context, and maintenance by staff. If these concepts require linking to more than one class of entity (eg. collecting event & publications?), then they absolutely become necessary structures that can be referred to. But, all I've seen to date is that they are merely conveniences for data entry or search in a collecting event, which to a degree means it's a solution looking for a problem. Is it more efficient to make & maintain groups of people, independent of any data object that they are attached to such that these can later be drawn-in wherever required as a single link OR is it the responsibility of a data object (i.e. collecting event) to circumscribe the group where each collecting event has multiple links out to each individual person listed (where possible) and each collecting event additionally records the ordering of these individuals (and can accommodate the gaps in ordering when it's not yet possible to make all links)?

This might all be mind-numbingly unnecessary, but it all circles back to what it is that we put in Agents (= People) & where/how we record (temporarily unknowable) strings vs things. Maybe it helps to think about the requirement for a collector number in collecting events. We know this number is in reference to the primary collector's sequence of specimens. There might have been a team of collectors, but only one of the individuals get this "number" because it's their sequence and is not shared with other members of the team. So... What if that primary collector does not yet have an entry in Agent (we have no idea who it is), but we DO have entries in Agent for other members of the collecting team? If we were making Groups as first class entities that we want to make use of anywhere in the system, does it matter that some (all?) of its members are unknown strings, with no link to an Agent entry? Even the primary collector that gets the collector number? The concept of a "group" then can conceivably devolve into nothing more than a string of characters, or a title because we do not yet know who any of its members are. Does that still make it a group?

cgendreau commented 3 years ago

With the addition of ORCID the requirement is met

AAFC-BICoE / dina-planning

data fields for agent names #98