Open tychonievich opened 1 year ago
A combination of better typificationand new event / attribute tags would be a more appropriate move in our view. It would allow maximal resuse of existing code (making adoption easier), whilst providing an opportunity to address concerns with localization.
Typification
The problems with the current TYPE
tag mainly relate to it being freeform text. It causes significant issues with localization (that is both different languages and different ways of describing the same event / attribute in the same language). Moreover, it makes it virtually impossible to interpret generic EVEN
and FACT
- as there is no deterministic way for an application interpret the TYPE
. We can see no other way to address these concerns but to enumerate the TYPE
.
New event / attribute tags We see very limited benefits in over abstracting the events / attributes into the purest possible number of event / attribute tags (see https://github.com/FamilySearch/GEDCOM/discussions/290#discussion-4934125). Whilst we understand the desire to keep the specifcation concise, such a step would require significant rewrites of all event and attribute code in applications and would hinder adoption.
Yet, we also agree it would be cumbersome to add tags for every possible suggested event or attribute where TYPE
could be used. Surely the extensive list of known extension event / attribute tags could be intelligently abstracted into a smaller subset of new or existing event / attribute tags with an enumerated TYPE
e.g. MILT
for military related data, DSCR
for physical description related data.
@tychonievich the CompGen group has compiled an extensive summary of GEDCOM custom tags that also identifies the applications that make use them that is certainly worth reviewing.
(Copied from conversation about PR #301)
I have a few points that need to be outlined and investigated as part of this discussion. 1) GEDCOM files “in the wild” 2) Typification of base Attributes/Events. 3) Internationalization, cultural sensitivity, subject expertise
In the Wild GEDCOM
This body seems to rely heavily on GEDCOMs from the wild to drive its decision-making processes. I agree that having a collection of sample/example files is an important part of the data gathering process, specifically as a starting point for understanding what tags are used. However, have we done any analysis on these files to determine their validity and usefulness? In statistical analysis we must be sure that the members of the sample represent the entire population. The questions I have are: 1) What is the “origin application” for each of these files? 2) Are they a good cross section of all applications? 3) Have the files been checked for origin date from the producing software? 4) Are they culturally/regionally diverse? 5) Do these files produce “custom tags” that have valid GEDCOM alternatives?
Typification of Attributes and Events
In data design we struggle with containing the breadth and width of the data structure. A serious look at the types of data being represented to prevent the breadth of the Attributes and Events needed for inclusion in any future Standard. Part of the job of considering any new addition Attribute/Event is the possibility of the addition being similar to or a continuation of another Attribute/Event. Typification of Event/Attribute would allow for queries to group similar events together in a single query. There are several examples of this in the suggested additions from GEDCOM-X.
MARR
can be Typified as either, a) Religious, b) Civil, c) Common Law, d) Partnership, e) other/phraseMILT
with Typification of, a) induction, b) commission, c) discharge, d) engagement, e) registration, f) deployment, g) other/phrase (NOTE: GEDCOM X has a dozen or more of these, some are subset of others (there are over 300 separation codes as part of a “discharge”. AWOL, Desertion being one)BURI
event can be use in several ways, a) Inhumation, b) Burial at Sea, c) Cremation, d) Donation to Research, e) Lost, f) Natural, g) Green, h) Burial Tree or Scaffolding, i) cave, j) other/phraseHELT
event with Typification of, a) surgery, b) illness, c) hospitalization, d) other/phrase, with a set of definitions that explain the use cases could work to reduce the breadth of new eventsIMMI
andEMMI
events rather (I’ve had need to document the “travel” between these events) I would like to see a Typification of moving events that encompass more variety in scope. I’m not sure if the event name Travel (TRVL
), Movement (MOVE
), Transit (TRANS
) have possibilities to encompass the concept. Typification could be, a) Immigration, b) Emigration, c) Vacation, d) Visitation, e) other/phraseIDNO
) and Social Security Number (SSN
) are both “types” from the same concept and should be combined using Typification!DSCR
) would be a great Attribute to use Typification to identify all values of an individual’s physical description,TYPE
enumeration could include {height, weight, eye color, tattoos, skin color, hair color, scars, lost limbs, other/phrase}. Doing this would eliminate the need for adding a dozen attributes used by the military and other recording entities, as well as the cumbersome descriptive list suggested by v5.5.1 GEDCOM. For example:NATI
) would benefit from Typification. The definition includes: national origin or other folk, house, kindred, lineage, or tribal interest. Use example:Internationalization, cultural sensitivity, subject expertise
I think that an important goal for this discussion is the inclusion of more data-points in the area of “Internationalization, cultural sensitivity, and subject expertise”. I’ve already indicated that I’m not an expert in DNA so my concepts above may not be 100% correct. Genealogy is a science that requires some levels of expertise from various cultural and historical perspectives. My interpretation of some of the constructs of past GEDCOM Standards are that cultural concepts in recording Genealogical data were out of the scope and expertise of the original designers. This is not to condemn them for any wrongdoing, I’ve been doing genealogical research for over 40 years in various European locations, but I can’t say that I know all or understand all concepts, let alone understand those of Asian, Middle Eastern, South American or Native cultures. While researching information for others I’ve come across historical uses for terms that are not covered by the current GEDCOM and may be out of the scope of understanding by the “steering committee”. One example is in Switzerland, the so-called "Bürgerort" (Place of origin). We know through historical context that many cultures (Asian, Middle Eastern) also used “place of origin” to record census information and collect taxes. Individuals/Families were required to return to their ancestral home to be counted. These cultures will use different names for these, and Typification can be used to support/provide a single GEDCOM “Fact” to record the location, but also provide value to support each culture’s term. For example:
Individuals with more expertise in other cultures should be found to investigate and bring together concepts that are found in multiple cultures but are called different things. Enumeration of
TYPE
values will also help in the translation effort that many applications could use to expand their customer base. The application I use does a lot of tag translations to support multiple languages from around the world and enumeratedTYPE
would be a step forward.Originally posted by @Norwegian-Sardines in https://github.com/FamilySearch/GEDCOM/issues/301#issuecomment-1513403034