FamilySearch / GEDCOM

Apache License 2.0
153 stars 20 forks source link

Extend predefined Event, Attribute and Role types based on GedcomX and other suggestions #117

Open cdhorn opened 2 years ago

cdhorn commented 2 years ago

I notice that no attempt was made to even try to align the predefined Event, Attribute and Role types with those in the GedcomX specification. I think it would be beneficial to have a much richer set of predefined values to cover as many use cases as possible to help avoid the use of custom types unless absolutely necessary.

There doesn't seem to be a clear mapping between the two especially as GedcomX includes some items under both categories, but perhaps something like the following would make sense:

Suggested Individual Events taken from GedcomX:

Other suggested commonly used Individual Events:

Suggested Family Events taken from GedcomX:

Suggested Attributes taken from GedcomX:

Other suggested Attributes:

Some additional suggested Roles:

It would probably make sense to survey the more popular third party genealogy applications and websites to see what other non-Gedcom defined events and attributes they have added to fold in others that may be commonly used.

It also isn't clear to me if the GedcomX documents are being kept current. For example I notice on FamilySearch website Affiliation, Religious Affiliation, Title of Nobility, and Occupation are events that can be added but they do not appear in the GedcomX event types specification. Furthermore the Title of Nobility is an attribute in Gedcom but I do not see it under the GedcomX fact types either.

cdhorn commented 2 years ago

The thought just popped in my head it would probably make sense to also add attributes for DNA related information and add a role type for shared DNA match as well.

tychonievich commented 2 years ago

Thanks for this list! A few that I think we already have:

I think we have these too, but am not sure I understand the nuance enough to be confident:

Given the current definition of MARR, the DomesticPartnership, CivilUnion, and CommonLawMarriage events would be subtypes of MARR; we could enumerate MARR types or create more "implies" type of relationships like we have for ROLE FATH implying ROLE PARENT

DNA was discussed at length when designing 7.0, and the conclusion of our team at the time was that there was not enough consensus on what DNA evidence looked like to be able to propose adding structures to represent it. I'd be excited if that's changed; your proposal of a ROLE for "DNA match" might be one we could add without needing too much consensus on what constitutes a match, for example.

cdhorn commented 2 years ago

I mentioned Affiliation as an event on FamilySearch though it seems to me it should be an attribute. It should be added to Gedcom as one or the other. I think you also need Membership which should be an attribute. This differs from Affiliation, for example someone may be affiliated with a political party without being an actual member.

I think you also need Induction to cover the event when someone is accepted into an organization as a member. It is an event as others are present and perhaps multiple family members joined and were inducted at the same time.

Another suggested event would be NameChange which here refers to the actual legal event formalized in a court of law.

Another suggested attribute would be Hobby

Another suggested role that comes to mind is Sponsor A Godparent is a specific type of Sponsor, but the more general term is applicable to many situations such as a Confirmation Sponsor in the Catholic faith or a Sponsor who recommends someone for membership in a fraternal or other organization. And another suggested role is Mentor

A few that I think we already have

Looking again quickly I mostly agree. I wish these were used consistently, why is Occupation an event in the FamilySearch FamilyTree when it is a fact / attribute elsewhere?

I think we have these too, but am not sure I understand the nuance enough to be confident

In my mind EMIG and IMMI imply a move or migration between countries. However one may want to record a move within a country, perhaps even as granular as from one side of a city to the opposite side, so MoveFrom and MoveTo would be more appropriate. In a similar vein a family might travel for a Vacation and you find some related passenger list records, so Arrival and Departure would be more appropriate.

Regarding the use of Living as an attribute this is the GedcomX definition:

A fact of a record of a person's living for a specific period. This is designed to include "flourish", defined to mean the time period in an adult's life where he was most productive, perhaps as a writer or member of the state assembly. It does not reflect the person's birth and death dates.

I am not sure anyone actually uses it. That is certainly not the same as RESN PRIVACY though.

GedcomX differentiates between MarriageBanns (aka MARB) and MarriageNotice and lumps both under Couple Relationship Fact Types. GedcomX also enumerates both BirthNotice and Obituary as facts. I think if you are going to do that then DeathNotice and possibly even FuneralNotice ought to also be added as both are generally very brief and not as extensive as an Obituary. While all of those and other newspaper articles are indeed sources I think the reasoning for inclusion of BirthNotice and Obituary in GedcomX is that they were published which in and of itself is a fact. If the argument is they should be treated only as pure sources they shouldn't have been added to GedcomX. Again the treatment of these things between the two standards should be as consistent as possible.

Regarding ChildNumber I think you need to consider a case such as one where all you know is the person was the 10th and last child of their family to die and you know nothing else about their parents or siblings yet.

I understand about the DNA evidence question and agree, but there should probably be PaternalHaplogroup and MaternalHaplogroup individual attributes.

atom888888 commented 2 years ago

I propose adding mtDNA and yDNA data support for GEDCOM 7.x (since they are MUCH smaller datasets). Programs like AncestralQuest 6.x already support this, so having this in the GEDCOM format would be a natural extension and support for what some vendors are already doing and allow for greater interoperability between programs and websites.

https://ancquest.com/index.htm

See here: https://github.com/FamilySearch/GEDCOM/discussions/118

tychonievich commented 2 years ago

Discussed in steering committee, 2022-04-12. At a high level, we want

We mentioned several approaches that might reach these goals, but did not converge on any one in particular.

cdhorn commented 2 years ago

So reference some public, centralized master event and attribute registry which would ideally be managed as a Github repository so anyone can contribute to it as needed.

Registry data could be stored as structured yaml perhaps. Maybe 1 file per event or attribute type to prevent any single document from getting too large.

In addition to allowing for hints you would certainly want to allow for the inclusion of translations as well for the short name/label and the longer description.

cdhorn commented 2 years ago

And perhaps if FamilySearch or someone else is willing then host some free online REST API read only service to make the data available for applications that might perfer to consume it that way.

Norwegian-Sardines commented 1 year ago

In my mind EMIG and IMMI imply a move or migration between countries. However one may want to record a move within a country, perhaps even as granular as from one side of a city to the opposite side, so MoveFrom and MoveTo would be more appropriate. In a similar vein a family might travel for a Vacation and you find some related passenger list records, so Arrival and Departure would be more appropriate.

I would rather this information be deduced by code rather than an event. We have individual Residence dates then we can deduce they moved from point a to b.

ungeahnt commented 1 year ago

To make sure that Citizenship will not be forgotten, I am attaching the related discussion here: #193

tychonievich commented 1 year ago

At RootsTech I spoke with several developers about the proliferation of event types. General observations:

  1. Standard is better than extension. We should add all that are used to 7.1
  2. Tools that speak with FamilySearch Family Tree already have to accept the GEDCOM-X "fact" types; hence every one of those should have a place in 7.1
  3. Some kind of hierarchy or supertype would be useful, for example to
    • generate reports and visualizations
    • search events
    • support research quality checks

I propose the following event and attribute hierarchy, in which I think I have included all 7.0, X, and other suggested values from this thread and from a private email I received. But it's a huge list so I likely omitted something inadvertently. If we like the organization then next steps would be to decide how to encode the organization in the spec, followed by a verification that nothing was inadvertently omitted.

I have not included DNA, which appears to require significantly more structure than any other attribute or event and for which we have competing proposals; see #118 for more.

cdhorn commented 1 year ago

I am glad to hear this. I like to think of events as experiences and attributes as characteristics, I think it makes the distinction clearer sometimes. So I'd tend to think of newspaper articles maybe falling into the attribute category. Anyway here are some more possibilities for consideration:

Possible parner events:

Possible child events:

Possible death / burial events:

Possible religious events:

Other possible events:

Possible attributes:

Possible roles:

For pedi:

ungeahnt commented 1 year ago

http://gedcomx.org/Race

The GEDCOMX-Page states:

"A fact of the declaration of a person's race, presumably in a historical document. "

(btw: You will find this text below the type http://gedcomx.org/Religion, which is doubled listed. There is no http://gedcomx.org/Race)

In what kind of context should Race be used in future GEDCOM? As I understand it, there is only one reasonable value for it: Homo sapiens sapiens ... and for a single value no own tag is needed.

There are different international definitions and interpretations of the term race, which are intensively discussed in their respective environment and context.

Besides the historical inhuman race-ideological view, this term is nowadays also used for socio-cultural classifications in some countries. I personally find the latter a very awkward choice, however, others decide this.

For the above reasons, however, I think that in GEDCOM the term Race should be considered in a differentiated way, and if it's to be introduced at all, it must be possible to give the tag an exact meaning. For example, through different tag names or an additional type.

Personally I don't see any need for "Race", because I certainly won't spread any (historical) racist ideologies - for which there is no legitimation at all - any further. For a socio-cultural classification the term "Ethnicity" is better suited.

tychonievich commented 1 year ago

Personally I don't see any need for "Race"

"Race" is a common descriptor on historical documents in the colonial and post-colonial European diaspora. While some researchers share your attitude of recording only legitimate information, others wish to encode every historical descriptor found. If we chose to remove it from 7.1 then it will still be recorded by those researchers either with the generic FACT catch-all attribute or using an extension.

I believe this is why GEDCOM-X defines it as the declaration of race, rather than of race itself, a distinction it does not make for any other of it's "fact" types.

(btw: [.…] There is no http://gedcomx.org/Race)

The defining document for this URI is is https://github.com/FamilySearch/gedcomx/blob/master/specifications/fact-types-specification.md. The documents served at the various gedcomx.org URLs have several errors, including the duplicate religion and missing race, presumably because they are not automatically extracted from the spec the way those served on gedcom.io are.

cdhorn commented 1 year ago

Some thoughs on event and attribute categories.

Maybe for events something broken down like:

Vital (birth and death related) Family Religious Academic Vocational Legal Medical Travel Military Political Leisure Residency Ownership

And then for attributes maybe something along the lines of:

Biological (sex, haplogroups, etc) Physical (description, height, weight) Identity (gender, ethnicity, etc) Descriptive (language, occupation, religion, education) Statistical (child number of mother, number spouses, number offspring) References (newspaper articles that sort of thing) Identifiers (ssn and the rest)

Norwegian-Sardines commented 1 year ago

I’d like to know and understand how several of these facts are actually used. For example: MarriageNotice NewspaperArticle Autopsy

To me these are not facts or attributes, they are sources.

I think we also need to be careful that “facts” are not just a different type of the same major concept. For example:

DomesticPartnership CivilUnion CommonLawMarriage

Are all variations on the concept of marriage or at least the overarching concept of a union between two people.

Currently I use the MARR tag for all of these concepts with the MARR.TYPE of Domestic Partnership, Civil Union, Common Law, Religious.

Another one would be MILITARY, is should have aspects of the military like: Draft Registration, Induction, Service, Deployment, Discharge all as MILT.TYPE values.

This is what I believe the best use of the TYPE tag.

cdhorn commented 1 year ago

Of course they can be considered sources. But it is also a fact that the article ran on a specific date and it referenced one or more people, or that the autopsy was performed on the remains by someone on such and such a date. Having those as enumerated types and enabling someone to use them to record things does not mean you have to choose to use it to record something. Perhaps some users want to see those on a timeline with everything else.

I follow your point on the TYPE tag. It is possible that not all programs may support having a type qualifier without major rework, but all will support having a base type. So I think there is something to be said for having more specific types and trying to provide a rich enough set to cover 80% of the use cases, though others I am sure may disagree.

With regards to military, it really is a category and not a specific type of event. I think it was perhaps not given enough thought years ago.

Marriage you have me thinking more about now. Personally I'd prefer separate event types for Civil Marriage and Religious Marriage, they are two different things. Indeed if going the TYPE qualifier route perhaps there should be no MARR tag, it should be UNION with the qualifiers being Civil Marriage, Religious Marriage, Common Law Marriage, Domestic Partnership, and Civil Union. And there should be a corresponding tag COUPLING with the qualifiers Cohabitants, Friends, Fling and I suppose Forced might be a less jarring term for the unfortunate fourth scenario.

Norwegian-Sardines commented 1 year ago

Personally I think that those programs that don’t support the GEDCOM TYPE tag should fix their programs! Why should programs that do support GEDCOM tags have to change the way they currently (and correctly) use GEDCOM so programs that don’t can get a free pass? It is also helpful to categorize some like items together for SQL Query’s.

While the DB designer in me would like the change of MARR to UNION or something more generic, I think that backward compatibility would overrule that change. We are not changing HUSB and WIFE in the Family_Record to support same sex marriage for the same reason, programs that may assume the sex of an individual from these tags will need to adjust!

If we are wanting more FACT and ATTR tags, I’d want additions like PAINTING, CARVING, MOVIE, PETS (maybe that would be DOG , CAT, FISH) to name a few that would be great for my family of painters, figure carvers, movie and stage actors, and people with dogs and cat as pets so that their time lines can show when they created the objects, participated in the movie/play or when the family pet was around.

NOTE: I’m just kidding, but driving home a point about taking standard facts and attributes too far! I do use some of these but use FACT/EVEN and TYPE with great success!

Having more tags does have one advantage, they can be translated to other languages easier, but how many programs actually do translation of these tags?

Norwegian-Sardines commented 1 year ago

I would rather see GEDCOM move to supplying a list of enumerated values for TYPE as variations of MARR tag, rather than: Coupling Event -- relates to the union of a couple http://gedcomx.org/CommonLawMarriage http://gedcomx.org/CivilUnion http://gedcomx.org/DomesticPartnership http://gedcomx.org/MarriageNotice

1 MARR 2 TYPE {Religious | Common Law | Domestic Partner | Civil Union | Announcement}

This way they can be easily added to the specification and translated by receiving programs. Additional concepts not known today can be added in the future as well!

This would work as well for other concepts as well that would eliminate variations of BIRT, BURI and others.

I have family members who’s remains were never found and I currently use a BURI.TYPE to say never found. Should I ask for another GEDCOM Event or Fact?

cdhorn commented 1 year ago

I would rather see GEDCOM move to supplying a list of enumerated values for TYPE as variations of MARR tag

The question at hand here and in the discussion thread as well on this issue seems to be should the standard try to keep a flat enumerated type space, with one specific type per tag, or not.

I suppose in the end it does not matter, what matters is that the types are enumerated so the intended information content is identifiable when being transferred between applications.

I have family members who’s remains were never found and I currently use a BURI.TYPE to say never found. Should I ask for another GEDCOM Event or Fact?

I would argue yes, probably should have a LOST or DISAPPEARED event. If they disappeared and were never found then you do not know if they were ever buried in the first place. To me you are really altering the meaning of burial in that case to be something other than a burial and mislabelling the information.

Norwegian-Sardines commented 1 year ago

To me you are really altering the meaning of burial in that case to be something other than a burial and mislabelling the information.

I don’t think that I am, a burial is a disposition of a decedent, this would be mutually exclusive of either, Inhumation, Burial at Sea, Cremation, Donation to Research, Lost, Natural, Green, Burial Tree or Scaffolding, cave and probably many other ones depending on culture. If GEDCOM is to support multi-cultural habits of individuals and family groups then we need to expand the definition of items to be inclusive and adding a new TYPE is far easier than adding a new FACT.

Norwegian-Sardines commented 1 year ago

I’ve add some comments to the https://github.com/FamilySearch/GEDCOM/issues/301 regarding this proposal.

tychonievich commented 1 year ago

The steering committee is committed to resolving this issue, but the conversation here is getting too long to be effectively used. We've added a new directory of files to hopefully help guide this conversation and proposals: the attribute and event requests. We invite contributions to those files, as described in the README in that directory.

As adequate evidence of use, value, and absence of specific event and attribute types are documented there we'll add them to future minor versions of the specification. We are also discussing revising the entire event/attribute system to something easier to maintain for a future major version.

We did not copy all of the items listed here to those documents, but are open to others doing so if they wish.

Norwegian-Sardines commented 1 year ago

Luther,

I’m not familiar with “pull requests” in GitHub. Also I would not know how to suggest the use of the TYPE tag to support a specific suggested new Attribute. Specifically, GEDCOM 5.5.1 and 7.x have an attribute called NATI that is already defined to support Tribe and Clan, all future GEDCOMs should do what I already do for the NATI attribute which is to indicate “ folk, house, kindred, lineage, or tribal interest” using the NATI.TYPE tag with enumerated values. I use this convention extensively and also suggest its use to anyone that uses the same application I do when they ask “how do I enter …”.

For example:

1 NATI Irish
2 TYPE Nationality
1 NATI Apache Chiricawa 
2 TYPE Tribe
1 NATI McDonald
2 TYPE Clan
tychonievich commented 1 year ago

I’m not familiar with “pull requests” in GitHub.

  1. Visit the page you want to edit
  2. Click the edit button on the top-right corner of the page
  3. Make any edits you wish
  4. At the bottom of the page, commit the change
  5. On the page that appears after the commit, press "Create pull request"

Note that pull requests are proposed changes that will be reviewed and possibly edited by the steering committee prior to merging. We appreciate fully formatted PRs, but can fix formatting and styling fairly easily if you have trouble there.

I would not know how to suggest the use of the TYPE tag to support a specific suggested new Attribute

There are a few examples of this, with different levels of assertion. In the table,

In addition to the table, add appropriate text in the Absence subsection of the corresponding section. (see, e.g. Stillbirth.

If that level of editing is uncomfortable for you, you're also welcome to file an issue requesting one of the steering committee make the edit for you. A separate issue (rather than a comment here) will let us track progress on fulfilling that request..

cdhorn commented 2 months ago

While a few other issues have been opened on this topic, wanted to include here as well that the CompGen group has compiled an extensive summary of GEDCOM custom tags that also identifies the applications that make use them that is worth reviewing.

dthaler commented 2 months ago

Discussion in GEDCOM Steering Committee meeting: Thanks for sharing those links, very valuable! Looks like we have some work to do to go through it.

Norwegian-Sardines commented 2 months ago

Discussion in GEDCOM Steering Committee meeting: Thanks for sharing those links, very valuable! Looks like we have some work to do to go through it.

This German group has done a lot of work regarding extending and using GEDCOM. If they were not on your radar before, you have missed out on a lot of information. I would note that some use cases are not complete but very close!