FamilySearch / GEDCOM

Apache License 2.0
163 stars 21 forks source link

Add enumerated-value event-TYPE #303

Open tychonievich opened 1 year ago

tychonievich commented 1 year ago

(Copied from conversation about PR #301)

          # Introducing New Attribute/Event Tags vs Better Typification

I have a few points that need to be outlined and investigated as part of this discussion. 1) GEDCOM files “in the wild” 2) Typification of base Attributes/Events. 3) Internationalization, cultural sensitivity, subject expertise

In the Wild GEDCOM

This body seems to rely heavily on GEDCOMs from the wild to drive its decision-making processes. I agree that having a collection of sample/example files is an important part of the data gathering process, specifically as a starting point for understanding what tags are used. However, have we done any analysis on these files to determine their validity and usefulness? In statistical analysis we must be sure that the members of the sample represent the entire population. The questions I have are: 1) What is the “origin application” for each of these files? 2) Are they a good cross section of all applications? 3) Have the files been checked for origin date from the producing software? 4) Are they culturally/regionally diverse? 5) Do these files produce “custom tags” that have valid GEDCOM alternatives?

Typification of Attributes and Events

In data design we struggle with containing the breadth and width of the data structure. A serious look at the types of data being represented to prevent the breadth of the Attributes and Events needed for inclusion in any future Standard. Part of the job of considering any new addition Attribute/Event is the possibility of the addition being similar to or a continuation of another Attribute/Event. Typification of Event/Attribute would allow for queries to group similar events together in a single query. There are several examples of this in the suggested additions from GEDCOM-X.

1 DNA H
2 TYPE mtDNA
1 DNA R1
2 TYPE Y-DNA
1 DSCR Brown
2 TYPE Hair

Originally posted by @Norwegian-Sardines in https://github.com/FamilySearch/GEDCOM/issues/301#issuecomment-1513403034

chronoplexsoftware commented 1 year ago

A combination of better typificationand new event / attribute tags would be a more appropriate move in our view. It would allow maximal resuse of existing code (making adoption easier), whilst providing an opportunity to address concerns with localization.

Typification The problems with the current TYPE tag mainly relate to it being freeform text. It causes significant issues with localization (that is both different languages and different ways of describing the same event / attribute in the same language). Moreover, it makes it virtually impossible to interpret generic EVEN and FACT - as there is no deterministic way for an application interpret the TYPE. We can see no other way to address these concerns but to enumerate the TYPE.

New event / attribute tags We see very limited benefits in over abstracting the events / attributes into the purest possible number of event / attribute tags (see https://github.com/FamilySearch/GEDCOM/discussions/290#discussion-4934125). Whilst we understand the desire to keep the specifcation concise, such a step would require significant rewrites of all event and attribute code in applications and would hinder adoption.

Yet, we also agree it would be cumbersome to add tags for every possible suggested event or attribute where TYPE could be used. Surely the extensive list of known extension event / attribute tags could be intelligently abstracted into a smaller subset of new or existing event / attribute tags with an enumerated TYPE e.g. MILT for military related data, DSCR for physical description related data.

cdhorn commented 5 months ago

@tychonievich the CompGen group has compiled an extensive summary of GEDCOM custom tags that also identifies the applications that make use them that is certainly worth reviewing.