FamilySearch / gedcomx

An open data model and an open serialization format for exchanging genealogical data.
http://www.gedcomx.org
Apache License 2.0
359 stars 67 forks source link

Proposal: Group #321

Closed stoicflame closed 5 years ago

stoicflame commented 5 years ago

The attached changes constitute a proposal for the introduction of a new data type, Group.

Comments are welcome.

Group Data Type

The Group data type is defined as a group of persons. The concept of a "group" captures institutional associations between persons that may or may not have direct familial relations between each other. Examples of a group could include plantations, orphanages, or military units.

properties

name description data type constraints
names A list of names of the group. List of http://gedcomx.org/v1/TextValue. Order is preserved. REQUIRED. The list MUST contain at least one name.
date The date of applicability of the group. http://gedcomx.org/v1/Date OPTIONAL.
place A reference to the place applicable to this group. http://gedcomx.org/v1/PlaceReference OPTIONAL.
roles Information about how persons were associated with the group. List of http://gedcomx.org/v1/GroupRole. Order is preserved. OPTIONAL.

GroupRole Data Type

In order to support the association of a person to a group, a new data type GroupRole must be defined.

Persons can be associated with multiple groups. Groups can have multiple persons associated.

properties

name description data type constraints
person Reference to the group participant. URI REQUIRED. MUST resolve to an instance of http://gedcomx.org/v1/Person.
type Enumerated value identifying the participant's role. Enumerated Value OPTIONAL. If provided, MUST identify a role type.
details Details about the role of he participant in the group. string OPTIONAL.
jralls commented 5 years ago

ISTR that we'd proposed a group type 7 years ago and that it had been shot down by TPTB, but I can't find the issue now. IMO this is long overdue.

nractive commented 5 years ago

I am way behind in the conversations about Group, Family, and any other types that might be applicable to my genealogical domain (enslaved populations and the relationships formed through racial slavery). I am trying to catch up (and learn my way around GitHub in the process).

I was particularly held by the discussion three years, Proposal: Family #281, where cogent arguments were made for the Group type. I am currently communicating with others in the genealogy community with similar interest so that I can offer a collected opinion for the project’s consideration here. Our input—not just mine—is important for the conversation. I’ll be back soon.

-Tim

brylie commented 5 years ago

It seems like the concept Relationship could be applied to group membership. In effect, people have relationships to individuals as well as groups. These relationships are basically graph edges that connect Person and Group entities, which can have attached metadata (such as the role).

stoicflame commented 5 years ago

Agreed, @brylie.

But I'm not sure what your comment means in practice. Are you suggesting using the ‘Relationship‘ data type instead of the proposed ‘GroupRole‘ data type? I'd be opposed to that idea because it unnecessarily confuses the notion of a simple relationship between two persons in the model. But if the notion of a ‘GroupRole‘ as proposed here isn't complete enough, I'd love to hear additional use cases that illustrate what else is needed. Is there a case, for example, for needing facts on a ‘GroupRole‘?

jralls commented 5 years ago

Group role needs at least begin and end dates. Since over time a person might have several roles in a group you can either have a single arc connecting the person and the group with the roles and perhaps other facts or events hanging off of it or multiple arcs with the various roles and events.

Otherwise you could rename role to membership , drop the type field and put all of the interesting bits as a text narrative in details. That would make some manipulations difficult, e.g. creating a timeline of a club's presidents.

brylie commented 5 years ago

Are you suggesting using the ‘Relationship‘ data type instead of the proposed ‘GroupRole‘ data type?

Well, my background involves working with the constituent relationship management software CiviCRM. In CiviCRM, and other CRMs, the concept of a relationship is fairly generic, and can be between

For some example,

Basically, notion that a relationship can only exist between two people and be of the types "couple" and "parent/child" is quite a limited subset of the types of relationships we observe in reality.

stoicflame commented 5 years ago

@jralls I agree with your proposal that a date is needed on the role. Proposed change pushed at d3f53e6.

@brylie I don't object to modeling those "relationships" you identify; I only object to redefining the Relationship data type to do it. The idea of relationships between groups is the only concept you identify in your comment that seems to be missing from the current proposal. How do you propose relating Groups to each other? What are the use cases that capture that need?

nractive commented 5 years ago

My frame of reference, as I previously indicated, is with enslaved populations, so I respond with this example. Here is a real use case:

• Juliet, b. 1834, d. 1867, an enslaved woman, never married • Simon, b. 1853, d. 1924, a biological child of Juliet • Malinda, b. 1798, d. 1874, the legal slaveholder of Juliet and Simon from their births to Emancipation in 1865 • Frank, b. 1820, d. 1898, son of Malinda, overseer, biological father of Simon (DNA evidence), non-consensual relationship with Juliet (rape)

Malinda also enslaved others: Henry and Sarah, not known to be related to each other or Juliet.

So, which data structure best supports this set of facts and documents the profound, intimate relationships expressed therein? A “family” existed in this small enslaved population and another as part of the farmer’s “household.” tracing family back in time might depend on first encoding all these relationships.

I can see the Groupdata type naming “Enslaved Population on Malinda Farm.” The GroupRoledata type could list each member of this group, but the type property must have role types appropriate to this group. The current Known Relationship Types for the Relationship Data Type obviously don’t work. I do not understand the “type” property of GroupRole, why an enumerated value, and what would be expressed.

@brylie, to me your suggestion for a Relationshipdata type instead of GroupRolemakes sense in clearer understanding of terms and greater flexibility. In the current model the interpretation of data facts for relationships is rigidly tied to couples and parent-child relationship fact types, which do not work here.

Could those existing fact types be modified as 2.2 Couple Relationship Fact Types, 2.3 Parent-Child Relationship Fact Types, and a new 2.4 Group Relationship Fact Types? Or perhaps, as @stoicflame says to not “confuse the notion of a simple relationship,” create new fact type sections: 2.1B Group Fact Types and 2.4 Group Relationship Fact Types? [Renumber sections as appropriate.]

stoicflame commented 5 years ago

Awesome. Thanks for the use case.

I'll suggest here how to model that using this proposal in its current state.

Note that there is another proposal (not formalized yet, pending conclusion of this discussion here) that will include some of the needed vocabulary elements and fact types for slavery cases. I think it's important to keep the two issues (modeling groups and modeling enslavement) distinct.

Note also that I'm using a "pseudo" data format for readability that reflects what the real XML or JSON would look like.

Person:
  name: Juliet
  gender: Female
  facts:
    birth:
      date: 1834
    death:
      date: 1867
    emancipation: #fact type to be defined in separate proposal
      date: 1865

Person:
  name: Simon
  gender: Male
  facts:
    birth:
      date: 1853
    death:
      date: 1924
    emancipation: #fact type to be defined in separate proposal
      date: 1865

Person:
  name: Malinda
  facts:
    birth:
      date: 1798
    death:
      date: 1874
    occupation: "Slaveholder"

Person:
  name: Frank
  gender: Male
  facts:
    birth:
      date: 1820
    death:
      date: 1898
    occupation: "Overseer"

Person:
  name: Henry

Person:
  name: Sarah

Relationship:
  type: ParentChild
  person1: Juliet
  person2: Simon
  facts:
    Biological

Relationship:
  type: ParentChild
  person1: Frank
  person2: Simon
  facts:
    Biological
      sources: (DNA evidence, etc.)

Relationship:
  type: Couple
  person1: Frank
  person2: Juliet
  facts:
    NonConsensual or Rape: #fact type to be defined in a separate proposal

Group:
  name: "Enslaved Population on Malinda Farm"
  date: 1830 - 1865
  role:
    person: Malinda
    details: "Slaveholder"
  role:
    person: Frank
    details: "Overseer"
  role:
    person: Juliet
    details: "Enslaved"
  role:
    person: Simon
    details: "Enslaved"
  role:
    person: Henry
    details: "Enslaved"
  role:
    person: Sarah
    details: "Enslaved"

So certainly there are fact types (noted in the data) that still need to be defined in a separate proposal.

What else is missing?

nractive commented 5 years ago

Thanks @stoicflame. Your pseudocode really helps.

Thinking about census data, “slaveholder” is not an occupation. Rather, it is descriptive identification for an owner of slaves (“owner” now being an objectionable label). Some other social or legal descriptive identifications that are not occupations—and, of course, different meanings—include “freeholder,” “guardian,” “trustee,” and “enslaved,” among others. Where might any of these descriptors fit in the current list of 2.1 Person Fact Types? I think these need a distinctive fact type, maybe “GroupRoleDescription” or something better.

I see with the role property of Group object as tying the person to the group. What of the reverse, what in the Person object ties him/her to the specific Group? This is important to someone looking at their enslaved ancestor for determining to what group(s) they belonged and thus determining “sibling” members of that group—who might be kin. Would there be an ID property to link persons to a Group or groups? (Perhaps all this is in the implementation of an application and not in the specification. Yet the structure of the specification will surely influence the application and thus should be very clear.)

Similarly, what describes non-couple, non-parent/child (non-biological) relationships? Something like this might be needed:

Person:
   name: Henry
   ...
   GroupRoleDescription: enslaved

Relationship:
   type: Enslavement
   person1: Malinda
   person2: Henry
   facts:
      legal enslavement

[BTW, I am communicating with and through other genealogists, some of whom are bewildered by discussions of objects and data types, and trying to channel their important concerns to this conversations.]

stoicflame commented 5 years ago

Thinking about census data, “slaveholder” is not an occupation. ...I think these need a distinctive fact type, maybe “GroupRoleDescription” or something better.

If "Slaveholder" isn't considered an occupation, then let's just leave it off the Person as a Fact. We'll just let the "Slaveholder" notion get carried in the details of the GroupRole. Does that work?

What of the reverse, what in the Person object ties him/her to the specific Group? ... Perhaps all this is in the implementation of an application and not in the specification.

Yes, being able to show all the Groups of a Person is an implementation-specific notion. Since the data format defines a way to link Persons to Groups, it doesn't also need a way to link Groups to Persons. The application that imports the data will ensure that the links go both ways.

I agree that the feature is important, and I would expect the application would want to implement that feature.

what describes non-couple, non-parent/child (non-biological) relationships?

I'd suggest just a generic (untyped) relationship. The notion of enslavement would be carried in the facts of the relationship:

Relationship: 
  person1: Malinda
  person2: Henry
  facts:
    enslavement: #fact type to be defined in a separate proposal

BTW, I am communicating with and through other genealogists, some of whom are bewildered by discussions of objects and data types, and trying to channel their important concerns to this conversations.

Great!

nractive commented 5 years ago

I'd suggest just a generic (untyped) relationship. The notion of enslavement would be carried in the facts of the relationship:

Relationship: 
  person1: Malinda
  person2: Henry
  facts:
    enslavement: #fact type to be defined in a separate proposal

I guess omitting the (optional) type property would be okay here, though I think some new fact type URIs would be needed; e.g., http://gedcomx.org/Enslaved. (See suggestions on the GEDCOM X forum.)

All in all, this looks promising. I do hope these suggestions are incorporated in the specification and then app developers will accommodate a broader view of relationships as discussed here.

As I explain this and seek input from other genealogists, I'll relay their comments.

Thanks!

brylie commented 5 years ago

In the nested example above, it is ambiguous what each person's role is in the relationship. Expressing this as a directed graph, with each person as a node and the relationship as a directed edge would make this clear. Nodes and edges can contain metadata.

nractive commented 5 years ago

Thanks @brylie. I had to do a quick lookup of directed graph, mathematics that is beyond my limited knowledge, but I think I understand the concept and how it applies here.

My aim—and I believe I speak for many genealogists—is to have the ability through the GEDCOM X specification to facilitate data categorization and organization for these non-traditional, family-like relationships, ultimately to enable application developers to build tools to capture data and represent the relationships in some visual manner.

Are you saying that Group and GroupRole and their properties are insufficient for expressing the relationship (directed edge)? How might you modify Group and GroupRole (or the pseudocode for the use case above) for a directed graph implementation? Are new fact types needed, as I suggested, and would these be helpful?

I’d be very interested in your thoughts.

stoicflame commented 5 years ago

I think @brylie is just talking about the generic (untyped) relationship example I gave. He's pointing out that if you remove the type of the relationship, there are no implicit roles, nor is there any "direction" to the relationship. So that's an argument for defining a specific relationship type for EnslavedBy or something.

That's fine, but it's not applicable to this scope of this particular proposal. We can hash out all those details when we conclude the other discussion and put together a separate proposal to meet those needs.

Are there any adjustments or enhancements needed for this specific proposal for the definition of Group and GroupRole?

brylie commented 5 years ago

My aim—and I believe I speak for many genealogists—is to have the ability through the GEDCOM X specification to facilitate data categorization and organization for these non-traditional, family-like relationships, ultimately to enable application developers to build tools to capture data and represent the relationships in some visual manner.

@nractive I really appreciate your goals, as they will allow many types of relationships to be modeled. I highlighted the last sentence of your statement, because I think a graph model can easily lend itself to rich visualization and even point-and-click data manipulation:

graph model showing family relationships

graph model for genealogy

That's fine, but it's not applicable to this scope of this particular proposal.

@stoicflame thanks for keeping this proposal focused.

Are there any adjustments or enhancements needed for this specific proposal for the definition of Group and GroupRole?

I still maintain that we can try to re-use and extend existing concepts, like Relationship to keep the model simple and (internally) consistent -- otherwise we might end up with quite a baroque standard.

nractive commented 5 years ago

These graph models are great at conceptualizing relationships. (Examples, I know, so I‘ll ignore Fido, rock music, and BMW as petty compared to the abject relationships created by slavery.) My thinking is that these models may provide the mathematical structure for the more familiar graphic visualization of a family tree in some application software. So, useful, indeed.

I guess I have reached my limit to understanding where Group and GroupRole will be sufficient or, on the other hand, unworkable. We who do African-American genealogy certainly need a way to express slaveholder/enslaved relationships in a clear and precise manner. To that end, I favor a bi-direction Relationship fact, if that is workable. But you perhaps see more clearly that I any disadvantages.

Great ideas have surfaced in these discussions. You architects and engineers have the knowledge and experience advantage here. I do trust that the best solution, the right solution, will be found. Keep it going.

stoicflame commented 5 years ago

It has been two weeks since the last comment. Unless there are any additional suggested edits, I plan to merge this within the next 24 hours.