FamilySearch / GEDCOM

Apache License 2.0
166 stars 21 forks source link

What is the meaning when a structure's name/label does not equal its defintion/description? #505

Open tychonievich opened 3 months ago

tychonievich commented 3 months ago

Most structures in the current spec have both a short word-or-two name or label (the "name" column in sections 3.3.1–3; the section headers in section 3.3.4) and a description (the "description" column in sections 3.3.1–3; the text beneath the section headers in 3.3.4). Some also have a different label and different description for the structure and its substructures in section 3.2.

In some cases these do not fully align. As a non-exhaustive selection of recent examples,

How should these and other similar issues be handled?

Some options I've considered:

  1. Define "and" semantics. The only correct use of a structure agreed with all of its various labels, names, and descriptions. Any other use will appear incorrect in some context.
  2. Define "or" semantics. Data might have been entered by someone seeing any one of the definitions in issolation, so asserting which one is being followed is misleading.
  3. Define "description supersedes label." A few-word label cannot provide the nuance and clarity that longer text can. If the label and description appear to be at odds, the description is correct.
  4. Treat each such case as an ambiguity to be patched. The resolution of the ambiguity could align with any of the three approaches noted above.
tychonievich commented 3 months ago

Resolved PRs:

Others I think might be problems:

Norwegian-Sardines commented 3 months ago

I need some time to digest these, and unfortunately I’m not at home and have very limited internet connection. All of these tags are examples of my continued frustration with the original GEDCOM bias toward a very narrow view of genealogical data recording. GEDCOM is at a crossroad, do we continue with the current path based on prior bias and simply add new tags to become more inclusive of newly “discovered” terms and conditions that will forever pop up? OR Do we step back analyze what “kind” of items we are recording bring together similar terms with an eye toward the fact that every culture/religion/government has similar, but not exactly the same, functions and creating recording tags that define a general concept with attributes that more specially define the concept.

For those of you that understand the basic concept of Object Oriented Programming, where a general “class” does not define an instance until the attributes of that class are defined. We know what a car class is but it is not instantiated until we set the attributes to indicate its make, year,color! The same can be true for any of the current tags that have narrow definitions but broader interpretations when culture/language/religion/government get involved! Right now our classes are not named “car”, rather they are called “yellow 1995 chevy”!

Throw out the old tag that is too restrictive, develope a new tag not based on a word that carries a definition of its own, and replace it with a “class tag” containing a series of similar concepts and strong definitions. If Adoption has multiple interpretations create a tag call PIP205 with like kind subterms that are specific to a particular culture. But I’m sure this is too radical!

Norwegian-Sardines commented 3 months ago

Why do we have two tags one for social security number and another for national identity number? They are similar just used by different governments!

tychonievich commented 3 months ago

I agree with the direction you suggest, @Norwegian-Sardines. I also am mindful that we should continue to maintain 7.0 even after replacing part (or all) of it with better systems, so even if an improved system bypasses this question in the future I think we'll still need to answer it for 7.0

tychonievich commented 3 months ago

More examples:

tychonievich commented 3 months ago

Discussed in steering committee 9 JUL 2024

Norwegian-Sardines commented 3 months ago

In cases were the tag means several thing where a strict interpretation of the word is normally taken, we could use the strict definition ,but add the caviot that others may interpret the tag more loosely “for example …” which can be used here and will be addressed more completely at a later time!

Norwegian-Sardines commented 3 months ago

If you and “the committee” like the idea of fact.KIND one place to incorporate the concept without too much hand wringing would be when adding a new tag.

For instance a “military” tag: We have seen requests for multiple new tags based around military enlistment, military war action/participation.

We could develop a list of military related events and fact (I think we have that already and I probably commented on it as well) and put the concept into v7.1 with the idea that others will follow as we find them.

Since I’ve already used fact.TYPE for years to add meaning to the current set of tags, for example the MARR.TYPE tags contains “common-law”, “civil”, “religious”. We would need to discuss how users like myself get from using TYPE to KIND!

Norwegian-Sardines commented 3 months ago

One of the things that I thought we were doing by expanding the use case of the current tags with expanded definitions was to provide applications a means to continue to use the old design of GEDCOM by bringing together the various “custom tags” back into the mainstream GEDCOM tag set!

If this was not the intention of these expanded definitions and by including them cause ‘’hand wringing” amongst the committee or the software industry than we should rethink expanding the definitions that are too far away from the definition of the actual tag!

Personally, I would be against changing back to a more precise definition of the tag word, but if the majority wants that then it should be so.

We should however be mindful that this is a genealogical implementation and any definition of the tag must only include concepts that are genealogical and not include outside influences not genealogical in nature.

For example: while I agree that “nickname” has definitions used outside of genealogy such as those used in IT for signon or screen name identification, that definition should not “color” the definition we use in genealogy to define what GEDCOM understands a nickname to be.

If a value for a tag can include any alpha-numeric and special characters then we need to remove the word “number” from the definition and say “value”. The INDO tag could be a drivers license and these could include letters and dashes. Same with CALN, library call numbers can include letters, why do us librarians call them call numbers even when they include a letter, old term new use case? We don’t let it bother us!

jkr-wrk commented 1 month ago

I understand that GEDCOM originated from a American Church to standardize historical American records, so all the usecases for conservative American structures are well supported. This results in many specific tags that could easily be classified as one.

For example ANGA, MARB, MARC, MARL, MARS, DIV, DIVF, MARR and finaly a EVEN to have a placeholder for all exceptions. All to answer: Is there a bond between two people (or has it ended). Or BAPL, CONL, INIL, ENDL, SLGC and SLGS. Which are specific to a single church. Excluding similar event from other religions.

I think it would be good to depreciate specific TAGS and make them more general. But I don't mind if the general tag does not match it's new purpose. As long as it matches intention. Use INDI for a person, FAM for a group and MARR for a bond. SPSE for the top Hierarchy and CHIL for the bottom.

And let application-builders build modules to accommodate cultural differences, but add tags that help cultural structures. And try to look at function, not at definition first. NICK is fine for daily names.