FamilySearch / GEDCOM

Apache License 2.0
166 stars 21 forks source link

Managing the Proliferation of Event Types #518

Open tychonievich opened 3 months ago

tychonievich commented 3 months ago

This is not really a new issue; it is an effort to collect topics from several other issues and discussions and add context for those who haven't followed those conversations. Those other issues are scattered and I'm not confident that I found them all; if I've missed something, please add it here!

If you want to propose specific new events or attributes, see the event and attribute proposal tracker.. If you want to see a large list of possible new attributes and events, see #117. There are also various issues discussing specific new events or substructures thereof. This issue is instead about reorganizing the entire event/attribute system.

The challenge/situation

GEDCOM currently has 47 event/attribute types (32 event types, 15 attribute types), not counting the generic EVEN and FACT structures. More than 200 additional types have been proposed in various issues here (notably #117).

Long lists of options make user interfaces challenging to create and user decisions hard to guide. For some applications they may also make code lengthy with increased chance of accidentally omitting or cross-coding some component.

The current set has some quite broad types, like DESC which subsumes multiple extension types some applications support (such as _COLO, _EYEC, _HAIR, _HEIG, _WEIG); and some surprisingly narrow, like BARM and BASM being distinct. This inconsistency in specificity helps fuel discontent, with those who like specificity wondering why the level of specificity they find in one place is not present in another; and those who like generic flexibility having complementary wondering.

Any type with a high degree of specificity makes translation and multicultural communication challenging.

The current set are not uniformly understood. Some structures have definitions that do not map well to non-English languages or non-Christian-European cultures. Some users apply the closest available structure to each situation not formally covered in the specification, such as using MARB for any marriage-announcement-like event even if it is informal and not a bann, while other users do not do this.

These topics are not restricted to events and attributes; calendars and name parts have both had similar discussions, but with fewer types in the 7.0 specification.

Five Proposed Solutions

  1. Add all the events and attributes that come to mind.

    This approach was rejected by the GEDCOM steering committee in April 2023, at which time the "valuable, absent, and used" criteria were introduced for discussing new event proposals. But that doesn't mean it couldn't be revisited.

  2. Add all the events and attributes that multiple applications support.

    This approach is implicitly the intent of the event and attribute proposal tracker and the "awaiting use" label in the issue tracker.

  3. Use only a small number of types; any additional clarification goes in a free-text field like TYPE or NOTE.

    While option has been mentioned in passing, I'm not aware of any serious proposal along these lines.

  4. Create a type hierarchy. For example a Marriage Bann ⇒ Marriage Announcement ⇒ Pre-Marriage Event ⇒ Marriage Event ⇒ Family Event ⇒ Event, where "⇒" means "is a subtype of" or "implies" or "is subsumed by".

    This approach is used in some peer specification, notably schema.org, but has not received much discussion here.

  5. Create a smaller set of broad types, with optional enumerated-value subtypes in a KIND substructure.

    This has two parts:

    1. defining the smaller set of broader types. Several have been proposed:

    2. adding the KIND substructure. A concrete proposal can be found in #322.

    An additional open question is if the enumerations would be singular or plural. We could do any of the following:

    1. One broad type, one specific type.
    2. All of the types in an subtype inheritance path.
    3. Functional tags: a value for any kind of announcement, a value for any kind of pre-event event, a value for any type of religious or church-sponsored event, and so on.

    This proposal has received the most discussion, but also has the most open questions.

Solution implementation options

Assuming we converge on a solution that we like, we could do any of the following:

In 7.1 we could

In 8.0 we could replace and refactor as much as we wish.

No matter what we do, it is likely that applications will wish to support exporting new data in 7.0 and earlier formats. For example, if we deprecate or remove the MARB in favor of some broader structure with a KIND we should be clear what KIND values imply this can be exported as a MARB.

Norwegian-Sardines commented 3 months ago

Thanks Luther for creating this issue. I think I am the one that may have started the conversation to move to the use of <fact>.TYPE to begin managing the proliferation of new event types. With the addition of the KIND tag to enumerate various “like” events as an alternative to using TYPE I personally think we have a winner!

I’ll let others weigh in with their comments, but I’m on board to discuss this option above others going forward!

cdhorn commented 2 months ago

When I brought up adding more event and attribute types in #117 the primary reason for doing so was related to information context. Having a far richer set of specific enumerated types helps preserve context when data is shared between people from different countries.

Think about it in terms of tagging data for machine learning, you want and need the tags to be applied consistently across languages and cultures. And the finer, the more detailed the tagging, the more context is preserved and value can be extracted.

Should that be accomplished with a flat namespace or a hierarchical namespace? Both have benefits and drawbacks. After some consideration I think the later, if well thought out, will have more long term benefits. However, as almost every genealogical site and application today uses a flat namespace I think changing that should be a 8.0 item. Ideally I would like to see shared events and groups in 8.0 as well, but I know that is wishful thinking.

In the end the primary responsibility of Gedcom is to serve as a data transmission envelope. It will always be a lossy envelope, but each iteration should strive to further improve fidelity.