FamilySearch / GEDCOM

Apache License 2.0
171 stars 22 forks source link

SURN tag does not allow comma separated surname list. #169

Closed Norwegian-Sardines closed 2 years ago

Norwegian-Sardines commented 2 years ago

GEDCOM v7.0.9 does not indicate that a comma separated list of surnames is allowed in the SURN tag just like the same tag under GEDCOM v5.5.1 which was defined as:

NAME_PIECE_SURNAME:= [ | , ] Surname or family name. Different surnames are separated by a comma.

Individuals with multiple surnames for example: 1 NAME Juan /Hernandez Martinez/ 2 GIVN Juan 2 SURN Hernandez, Martinez

The SURN tag should have a comma between surnames for indexing purposes AND maintain consistency and backward compatibility with GEDCOM v5.5.1

tychonievich commented 2 years ago

This was an intentional change and listed as an ambiguity correction in the 7.0.0 changelog. It was one of several ambiguities that were identified caused not by a poorly-written spec but by prevalent use that did not align with the spec. In particular, we identified many applications that did not respect the comma and some researchers who incorrectly typed commas as a substitute for letters or diacritics they did not know how to type properly leading to enough 5.5.1-invalid data in the wild that some careful application developers were choosing to violate the 5.5.1 spec to conform to common practice instead.

In version 7.0, the mechanism provided to indicate that there are several name parts with the same type is using multiple substructures of that type, as

1 NAME Juan /Hernandez Martinez/
2 GIVN Juan
2 SURN Hernandez
2 SURN Martinez

A spec-conforming 5.5.1 / 7.0 converter could switch between comma-separated payloads in 5.5.1 and multiple structures in 7.0, though again as noted above a file claiming to be 5.5.1 may have violated the 5.5.1 spec in this regard.

dthaler commented 2 years ago

Discussion12 JUL 2022: Add entry to the migration FAQ on gedcom.io with this answer

Norwegian-Sardines commented 2 years ago

~~As of GEDCOM v7.0.8 the specification does not provide for multiple SURN tags specifically stating {0:1} for this tag.

Therefore the above solution and proposed changes to the converter would not generate valid v7.0 GEDCOM. Without a comma separated list of specific surnames, GEDCOM v7.0 currently provides no way for indicating a difference between individuals with two or more family surnames vs an individual with a single multipart family surname.~~

Sorry, GEDCOM v7.0.9 puts back the {0:M} notation for the SURN tag.

albertemmerich commented 2 years ago

We have a new ambiguity in GEDCOM 7.0 as we allow space separated payload in the NAME_PIECES, too. The example above may be exported in the way it was exported in 5.5.1 by most applications without modification:


1 NAME Juan /Hernandez Martinez/
2 GIVN Juan
2 SURN Hernandez Martinez
`
Norwegian-Sardines commented 2 years ago

The primary issue not totally resolved with this "fix" is that GEDCOM does not differentiate between surname customs where 1) two or more family surnames are used to identify an individual, and 2) where an individual has a two part (but singular) surname. Not all individuals follow the custom to combine two names in to a surname by placing a dash/hyphen between the two parts.

A singular surname of "Johnson-Allen" may be customary in one region, where "Johnson Allen" may be a surname customary in another region but they are not considered a multiple surname custom such as Hernandez Martinez.

My take would be that GEDCOM should do the following:

Single Surname Custom: 1 NAME John /Johnson-Allen/ 2 GIVN John 2 SURN Johnson-Allen

-or-

1 NAME John /Johnson Allen/ 2 GIVN John 2 SURN Johnson Allen

Multiple Surname Custom: 1 NAME Juan /Hernandez/ /Martinez/ 2 GIVN Juan 2 SURN Hernandez 2 SURN Martinez

I realize that the double sets of "/" was shot down earlier but is valuable in these situations. It is very important for genealogy software that manages statistical information and large numbers of family groups to identify the family group each individual participates. For example the Hernandez family (or the Martinez family) would want to know the number of members in that family group. If the SURN tags were not separated, statistics would be lost and inaccurate for one or both of the family groups.

ghost commented 2 years ago

You cannot have distinct surnames separated by spaces inside a single /.../ element.

There are many surnames that include punctuation characters (e.g. hyphen or apostrophe), but also ones contains a space. In principle, this should be a non-breaking space but that would be difficult to enforce and so ordinary spaces must be acceptable.

Examples Include:

/St John/, or /St. John/ (see St John (name) https://en.wikipedia.org/wiki/St_John_(name)), pronounced /Sinjin/or /Sinju//n.

/and Irish names including name particles, such as

/Ó Dónaill/ or /Ó Conchobhair/

Tony

On 16/07/2022 21:15, Norwegian-Sardines wrote:

The primary issue not totally resolved with this "fix" is that GEDCOM does not differentiate between surname customs where 1) two or more family surnames are used to identify an individual, and 2) where an individual has a two part (but singular) surname. Not all individuals follow the custom to combine two names in to a surname by placing a dash/hyphen between the two parts.

A singular surname of "Johnson-Allen" may be customary in one region, where Johnson Allen may be a surname customary in another region but they are not considered a multiple surname custom such as Hernandez Martinez.

My take would be that GEDCOM should do the following:

Single Surname Custom: 1 NAME John /Johnson-Allen/ 2 GIVN John 2 SURN Johnson-Allen

-or-

1 NAME John /Johnson Allen/ 2 GIVN John 2 SURN Johnson Allen

Multiple Surname Custom: 1 NAME Juan /Hernandez/ /Martinez/ 2 GIVN Juan 2 SURN Hernandez 2 SURN Martinez

I realize that the double sets of "/" was shot down earlier but is valuable in these situations. It is very important for genealogy software that manages statistical information and large numbers of family groups to identify the family group each individual participates. For example the Hernandez family (or the Martinez family) would want to know the number of members in that family group. If the SURN tags were not separated, statistics would be lost and inaccurate for one or both of the family groups.

— Reply to this email directly, view it on GitHub https://github.com/FamilySearch/GEDCOM/issues/169#issuecomment-1186282282, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACDJB3TNXE3AE2FNY7XSPS3VUMJ6HANCNFSM523XKQFQ. You are receiving this because you are subscribed to this thread.Message ID: @.***>

Norwegian-Sardines commented 2 years ago

You cannot have distinct surnames separated by spaces inside a single /.../ element. There are many surnames that include punctuation characters (e.g. hyphen or apostrophe), but also ones contains a space. In principle, this should be a non-breaking space but that would be difficult to enforce and so ordinary spaces must be acceptable. Examples Include: /St John/, or /St. John/ (see St John (name) https://en.wikipedia.org/wiki/St_John_(name)), pronounced /Sinjin/or /Sinju//n. /and Irish names including name particles, such as /Ó Dónaill/ or /Ó Conchobhair/ Tony

Is your assertion that a NAME tag must therefore allow for multiple sets of “/“ when an individual has two distinct surname parts? As in my example: Multiple Surname Custom: 1 NAME Juan /Hernandez/ /Martinez/ 2 GIVN Juan 2 SURN Hernandez 2 SURN Martinez

But this NAME tag was specifically indicated as not valid in issue #136, and eventually changed the spec to allow multiple SURN tags, but still requiring the multiple surnames in the NAME tag to be encased within one set of “/“.

tychonievich commented 2 years ago

You cannot have distinct surnames separated by spaces inside a single /.../ element.

This is not quite correct. The PersonalName datatype cannot tell if the slashes delimit one or several surname-like components, but with the addition of one or several SURN that distinction can be added.

Additionally, it is worth noting that SURN does not require that its payload be part of the slash-delimited part of the PersonalName value.

Hence, as I read the spec I believe that all three of the following are permitted:

0 @I1@ INDI
1 NAME Juan /Hernandez Martinez/
2 SURN Hernandez
2 SURN Martinez
0 @I2@ INDI
1 NAME Martin /St John/
2 SURN St John
0 @I3@ INDI
1 NAME Joaquin /Perez/ Quinones
2 SURN Perez
2 SURN Quinones
Norwegian-Sardines commented 2 years ago

Based on the above from tychonievich, would this also be valid?

0 @I3@ INDI
1 NAME Joaquin Perez Quinones
2 SURN Perez
2 SURN Quinones
tychonievich commented 2 years ago

Would this also be valid?

0 @I3@ INDI
1 NAME Joaquin Perez Quinones
2 SURN Perez
2 SURN Quinones

I don't think so. The PersonalName datatype says the slashes are

used to delimit the portion of the name that most closely matches the concept of a surname, family name, or the like.

If there is something close enough to the concept of a surname to deserve the SURN tag, I read the above as implying it should also use slashes.

The reason I had an example with more SURN than the slashes cover is based on how names are given in many Spanish-speaking societies: people are given two surnames, where the first is more significant. Joaquin in this example might have been called "Mr. Perez" or "Mr. Perez Quinones" but never "Mr. Quinones", and will give his children the surname Perez but not Quinones. Because of that I could see arguments both ways on how to put slashes about the part that "most closely matches" a family name: either Joanquin /Perez/ Quinones or Joaquin /Perez Quinones/.

Norwegian-Sardines commented 2 years ago

The reason I had an example with more SURN than the slashes cover is based on how names are given in many Spanish-speaking societies: people are given two surnames, where the first is more significant.<

In societies that don’t have a surname but may have a clan, family group or other grouping, the idea of a “most closely matches the concept of a surname, family name” does not exist, BUT they are still need to be statistically and conceptually group together.

This occurs in: Asian families were we have a very limited and short list of names The West might call a family name, but is not used to group families together, but a village or clan name would.

In patronymic naming customs, where each generation gets a name based on their father’s given name, but because in most cases they always lived in a a specific, house, farm or town, they can be group together by those names but they are not a surname or family name.

For example:

1 NAME Ole Jansen 2 GIVN Ole 2 SURN Bruflott

Where Bruflott is the family farm but not a surname or family name. His sister would be known as Maria Jansdotter, mom would be Freya Monsdotter, dad as Jan Martinson. Everyone lived on this family farm.

dthaler commented 2 years ago

Discussion 16 AUG 2022: committee discussion was that the farm/house name should not be put into the SURN payload since it is not a surname or family name. It may be a gap in the standard tags, but could have a TYPE for farm name, and/or an extension tag for the farm name, like we discussed having an extension tag for _RUFNAME for the rufname purpose. A similar issue has arisen around royalty, where that would be in the TITL tag on an individual.

The following are in use today (correctly or incorrectly):

1 NAME John /England/
1 TITL King

1 NAME John /England/
2 SURN England
1 TITL King

1 NAME John
1 TITL King of England

The last one would be the most correct.

One could have

1 NAME Ole Jansen
2 GIVN Ole
2 _FARMNAME Bruflott

But it could also be modeled as a place, rather than a name, e.g.

1 NAME Ole Jansen
2 GIVN Ole
1 RESI
2 PLAC Bruflott
Norwegian-Sardines commented 2 years ago

I already model it as a PLAC, but the intention of associating “Bruflott” as part of a name is that: a) family units both residing at the farm and residing at a different farm still associate themselves with the Bruflott farm name as a family (not as part of their public identity), b) after a law was put in place in 1924 in Norway (and at other times in Europe) some people (not all) took the farm name as their required surname.

This association is valuable for outlining a genealogy for a family group over the 300 to 400 years of their existence. The name “Bruflott” was not a surname in 1701, but could have been used as a surname in 2001.

This has some of the same aspects of family grouping like the “House of Tutor”, “House of Windsor” where individuals did not really have a surname but associated themselves with other family members by saying they belonged to “The House Of”.

I would like to have entered:

1 NAME John
2 SURN Windsor
1 TITL King
2 TYPE England

Or, but not my favorite

1 NAME John
1 NATI Windsor
2 TYPE House
1 TITL King
2 TYPE England

The last being correct for the committee, but not supported in family lists, charting, or any other grouping mechanisms by any of the major or minor genealogy software, so it becomes useless in practice!

Entering the TITL with TITL:TYPE and NATI with NATI:TYPE would allow a smart genealogy program to print a name with title and house correctly as:

King John of England, House of Windsor

But of course no major genealogy program is that smart! ;-)

tychonievich commented 2 years ago

Discussed by steering committee

  1. By merging #187 we can now use the NATI solution given in the last comment
  2. We know that the current NAME structure does not capture every use case, but cannot change it significantly in 7.x because doing so would be backwards-incompatible. We anticipate introducing a much more flexible name structure in 8.0

We are closing this issue, but if additional problems come up or if there is a recommendation for a new name part type to introduce in 7.1, please re-open the issue (or file another one)

Norwegian-Sardines commented 2 years ago

What is the time frame for v7.1 recommendations regarding new name parts?

Obviously, a person’s identity begins with their name and I would include identity items like.

Rufname, Patronymic, Farm/Location, Clan, House, Tribe to name a few.

Rufname has been discussed here by the German delegation, Patronymic is common in many cultures and still used in Iceland and probably other. Farm/Location is also historically common. Clan, House, Tribe are similar and used in European and Asian (probably all cultures) and represent individuals that participate in a grouping that is part of their name and identity.

An example of both Patronymic and Location in one famous name is; Leonardo di ser Piero da Vinci (aka Leonardo da Vinci). Leonardo is his given name, di see Piero is a Patronymic reference to his father Ser Antonio Piero, and da Vinci is an indicator of birthplace.

Korea has approximately 288 family names and almost 50% of individuals have one of 3 family names. To differentiate between unrelated individuals with the same family names they use a Korean clan village name (called Jipseongchon) to differentiate origin of the family name by village. This has some parallels in ancient times for use when doing census based on town of birth.

tychonievich commented 2 years ago

@Norwegian-Sardines asked

What is the time frame for v7.1 recommendations regarding new name parts?

While not set in stone, our current working estimate is that we'll have an open meeting associated with RootsTech 2023 to get input from voices that might not be active on github, then use that input as well as the various next-minor tagged issues here to draft 7.1. I expect that will take us a few months, and then I expect a few months of a comments period prior to release; if everyone is happy with what we draft 7.1 might be released around this time next year.

Things could go faster if we decide to draft 7.1 prior to the RootsTech meeting, or slower if we run into difficulty writing it or disagreement with what we write.

tychonievich commented 2 years ago

EDIT: moved to #190, where I had mean to post it originally.

dthaler commented 2 years ago
  1. Sort -- partially covered by the / in the name payload, but not in all cases; for example, it will not help sort by root name instead of name form. Proposal: add INDI.NAME.SORT_AS in 7.1

I will observe that that for dates, GEDCOM introduced a parallel structure (SDATE) rather than a substructure like DATE.SORT_AS. As such, it would seem inconsistent for names to use a substructure instead of a parallel structure for indicating a sort form.

I think this discussion should move to issue #190 rather than here though.

ghost commented 2 years ago

That's right!

I am also strongly in favour of sticking to the original V7 cardinality for GIVN and SURN so that specific properties (e.g. call-name) can be applied to each instance. NB: a call-name is not specific to German, and even in US, UK, and Ireland, the preferred given-name is not necessarily the first.

Tony

On 17/07/2022 15:09, Norwegian-Sardines wrote:

You cannot have distinct surnames separated by spaces inside a single /.../ element. There are many surnames that include punctuation characters
(e.g. hyphen or apostrophe), but also ones contains a space. In principle, this should be a non-breaking space but that would be difficult to
enforce and so ordinary spaces must be acceptable. Examples Include: /St John/, or /St. John/ (see St John (name)
https://en.wikipedia.org/wiki/St_John_(name) <https://en.wikipedia.org/wiki/St_John_(name)>), pronounced /Sinjin/or /Sinju//n. /and Irish names
including name particles, such as /Ó Dónaill/ or /Ó Conchobhair/ Tony
… <#>

Is your assertion that a NAME tag must therefore allow for multiple sets of “/“ when an individual has two distinct surname parts? As in my example: Multiple Surname Custom: 1 NAME Juan /Hernandez/ /Martinez/ 2 GIVN Juan 2 SURN Hernandez 2 SURN Martinez

— Reply to this email directly, view it on GitHub https://github.com/FamilySearch/GEDCOM/issues/169#issuecomment-1186529775, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACDJB3USEKQLEYCY6RFYC63VUQHYBANCNFSM523XKQFQ. You are receiving this because you commented.Message ID: @.***>