FamilySearch / gedcomx

An open data model and an open serialization format for exchanging genealogical data.
http://www.gedcomx.org
Apache License 2.0
350 stars 67 forks source link

Limitations of Current Name Parts #330

Open clarkegj opened 3 years ago

clarkegj commented 3 years ago
stoicflame commented 3 years ago

Thanks for the report.

difficult to represent names where the usual presentation is not the western style of prefix, given, surname, suffix.

Need more information. Is this asserted because we need more name part types to designate non-western parts? Or is there some assumption that a name part has to have a type?

difficult to represent names with multiple instances of the same part.

Again, need more information. Just have multiple parts as needed, no?

difficult to ensure that name parts are available because the name part pieces are optional.

I think works as designed, because we never had a requirement that such assurance was needed. Is that assurance not an application-specific domain requirement, outside the scope of a model definition?

German researchers want to record RUFNAME, the "appellation name" or "call name" by which a person is usually known

Does the http://gedcomx.org/Familiar name part qualifier not work for this use case?

French-Canadian researchers want to record “dit names”,

So do none of the name part qualifiers work? Do we need to add another qualifier to the CV?

cannot handle name parts that contain the slash character.

Why not?

cannot handle [Mononymous names]

Why not? Just don't assign a name part type, no?

cannot handle [Spanish compound surnames]

Why not? Just identify which parts are the surname with the name part type, no? The issue of which is used for indexing is (again) an application-specific concern and outside the scope of the model spec, no?

cannot handle [Patronymics]

Still unsure of the use case described here, but we can identify patronymics using name part qualifiers. If there are deficiencies in the qualifiers, it's easy to enhance the CV.

Enslaved people on slave rolls or probate accounts often have only one name.

Right. So just put in that one name, yes?

albertemmerich commented 3 years ago

German researchers want to record RUFNAME, the "appellation name" or "call name" by which a person is usually known

Does the http://gedcomx.org/Familiar name part qualifier not work for this use case?

No, it does not. The "Rufname" is one of the given names of a person, marked by underlining it in official German documents. So if you have a person with given names Johann Wilhelm Friedrich, any of these or none may be the Rufname. German researchers need a possibility to mark which given name is the Rufname. The "Rufname" may not be used if it is no official marking, but only the use within family or friends to call a person by a specific name. This would be a nick name. The person Johann Wilhelm Friedrich could be nick named Willi, but Willi cannot be a Rufname for this person as it is not within the given names.

stoicflame commented 3 years ago

Sounds good, fair enough. So let's add another name part qualifier CV element? Does that work?

jamestanner45 commented 3 years ago

Here is my response to Ryan.docx

ghost commented 3 years ago

I suggest some re-alignment to coincide with the NAME_PARTS structure of GEDCOM v7. That already copes with names that do not have that western style (a la old GEDCOM capabilities), it copes with multiple instances of the same part (e..g multiple family names as in Spanish names), it copes with different name-part ordering (for cultures where, say , family name comes first) and it allows extensions to the set of name-part types.

One weakness of the GEDCOM v7 specification (as it stands in the current draft) is that it is not clear on the distinction between a part being unknown (as commonly written using the horrible and ambiguous "FNU" or "LNU") and when it is inapplicable in that specific case. This was discussed in the associated meetings, but was left for a future revision.

Another weakness is that it has no concept of a sort order. No software should assume a sort order based on numeric character codes as ranges of characters (e.g. alphabetics with accents) are sorted differently in different cultures. More importantly, the sorting of Japanese names often depends on the pronunciation, which is not strictly defined by the actual spelling.

ghost commented 3 years ago

Albert, regarding the Rufname, do you remember any discussion in the GEDCOM meetings about whether it could be applied in other cultures? All English-speaking countries have some concept of a preferred given name that isn't the first one, and some families do this as a tradition. It may not have the same legal recognition as the German equivalent but the case exists nonetheless. We have no guidelines for how this should be addressed generally, and whether Rufname is applicable.

albertemmerich commented 3 years ago

We did not discuss that. However I can say that in Germany the users do not follow the strict definition of Rufname, and use this possibility to mark a given name as Rufname in cases, where in later records (i.e. marriage records in church books) only one of the given names shown in the baptism record is mentioned. This is another situation as the underlining of one given name in documents due to the definition given by law.

stoicflame commented 3 years ago

@jamestanner45 thanks so much for writing up a response. My response:

Traditional Western European genealogy as shown by the “standard” format family group record and pedigree charts assumes a name of essentially four parts... Outside of this Western European and primarily English language-based format, there are few areas where this name pattern adequately represents the cultural naming pattern. Non-English, Non-European genealogists struggle to “fit” their naming patterns into the Western European mold.

Agreed. I'm not trying to deny this, I'm just trying to push back on the assertion that the GEDCOM X specification dictates conformance to this pattern. For example, we were careful to make name part types both optional and extensible. Where the existing controlled vocabulary (CV) of name part types is insufficient, let's get it enhanced to accommodate all cultures.

There is no real reason for assuming that a name has to have a certain definition or type.

Agreed. Hence GEDCOM X makes the name part type both optional and extensible.

The four field limitation with two of the name parts designated as prefix and suffix are insufficient to represent even some fairly common naming conventions such as those used in the Spanish speaking areas of the world that have not adopted English naming patterns.

Agreed. Again, I'm asserting GEDCOM X does not limit name definitions to four fields.

Albert has responded to the use of the German “Rufname.” This is just one of the many possible naming patterns that need to be accommodated.

Agreed. My proposal is to define a new name part qualifier CV element called http://gedcomx.org/Rufname to accommodate this use case. I'm suggesting this might be sufficient to cover the use case and looking for confirmation of that.

The GEDCOM X name part qualifiers follow a pattern I call “dictionary” usage, that is trying to define every possible field.

Just to bolster clarity, I use the term "controlled vocabulary" or "CV" to refer to your concept of a "dictionary".

I think the dictionary approach merely reinforces the Western European bias hence my example of the Shoshone and Navajo cultural names that do not fit any of the listed dictionary definitions and categories.

I agree that the current CV is biased to Western European cultures, but I'm suggesting that's only because we haven't received proposals to enhance it. Again, the current specification does not require that a CV be used, and it allows for the CV to be arbitrarily extended. To address your specific use cases, an application that supports Shoshone and Navajo names can add the names without a name part type, or the application can define its own custom (unregistered) name part type as needed.

Adding more name part qualifiers is a no-win solution. You cannot possibly define every category of name parts in every culture around the world.

I guess I beg to differ. I mean... I know I can't personally define every name part type in every culture around the world, but the spec could certainly be enhanced to add name part types and name part type qualifiers anytime an application developer needs one.

And (again), the application developer could always just not use the CV, or define its own custom (unregistered) CV element as needed. The disadvantage of this approach would be that other application developers wouldn't know what it meant semantically until it was officially defined by the spec.

If the answer is to just add another “name part type” then I will simply come up with more name parts not in the current list.

Yes. Correct.

There are currently 7,111 languages spoken around the world and probably many times that number of differently designated name parts.

Well, the goal of a CV is to be independent of language. The work of translating a particular CV element to a user-displayable text term is the job of the application developer. Just because the current CV uses English to define its elements doesn't mean that it expects English to be used for display purposes to end-users.

I guess that we could have used opaque identifiers for each name part type e.g. http://gedcomx.org/name-parts/00001, http://gedcomx.org/name-parts/00002, etc. But we decided to go with e.g. http://gedcomx.org/Suffix, http://gedcomx.org/Given, etc. to make life easier for developers so they wouldn't have to keep identifier-mappings in their head.

I believe that GEDCOM should be a conduit for exchanging genealogical data between platforms, not a reflection of one or a few culturally determined categories.

Agreed. Again, where GEDCOM X is a "reflection of one or a few culturally determined categories," I'd like to enhance it.

Why should I have to “identify my name part types?” Why can’t I just enter the name of my ancestor as it appears in the record and have the computer index it by a full-word search?

Well I can think of a number of application-specific reasons. FamilySearch, for example, requires identification of a "first given name" under some circumstances for reasons you can probably deduce.

But anyway it doesn't really matter because the GEDCOM X spec shouldn't dictate application-specific requirements. If GEDCOM X is dictating application-specific requirements, we'd like to know where we need to make adjustments and/or enhancements to fix that.

Why do I have to have a “surname” when, in my culture (speaking as a non-English, non-Indo-European language speaker) there is no concept of a “surname” as such?

Agreed. Again, GEDCOM X doesn't require that a name part be designated as a surname, or any other name part type defined in the currently-limited CV.

How do you know you have “identified a patronymic?”

The user would presumably say whether a name part is patronymic.

For example, take the common English surnames Berry or Perry, both of these names were historically from either “ap harry” or “ab harry.” If you were doing Welsh research would you recognize these names as patronymics?

I'm not a professional researcher, so I'm certain I wouldn't personally recognize them as patronymics. But I assume there are some researchers who can recognize the patronymic and GEDCOM X wanted to support application developers who wanted to support this kind of research where Patronymic needed to be modeled.

Lastly, about enslaved people from Africa with a name such as “Tom” or “Mary” of course they had a real name in Africa but do we use these names as given names?

I'm not sure how the application wants to implement this use case, but we definitely want to make sure GEDCOM X can model it as needed. I guess the way I'd use the model in this case is to add "Tom" or "Mary" as an alternate name, and add the African name as the birth name.

Is there some concept of "enslaved name" that needs to be captured as a CV element somewhere?

The implications of creating a system that classifies people by a Western European culturally determined naming pattern must by its nature exclude anyone who do not fit the pattern.

Agreed. Let's not do that.

albertemmerich commented 3 years ago

Again the German "Rufname": GEDCOM is plain text, but in the sources the Rufname is marked by underlining. So if we want to transfer the existing data given in the sources, we can use names pieces (or parts as we call them now) or we can use some sort of markup. We know the GEDCOM markup fur surnames: /.../ Before we introduced _RUFNAME as GEDCOM 5.5.1 extension in our German GEDCOM-L group a lot of researchers were using different markups like 1 NAME ALBERT Wilhelm /Emmerich/ 1 NAME _Albert_ Wilhelm /Emmerich/ 1 NAME A L B E R T Wilhelm /Emmerich/

and so on. All solutions have one problem: They do not represent what the source was telling. The source is telling, that Albert Wilhelm are my given names, Albert being the Rufname marked by underlining in the document, and Emmerich is my surname.

The problem is: Many sources DO show the type of name parts, and we need a way to get this into the representation of the data. There is a big need for Western type name parts, and we cannot ignore that only because there are more cultural areas calling for other systems.

In GEDCOM-L we decided to have it this way

1 NAME Albert Wilhelm /Emmerich/ 2 GIVN Albert Wilhelm 2 _RUFNAME Albert 2 SURN Emmerich

So dropping all name part information would be no good idea!

stoicflame commented 3 years ago

@albertemmerich thanks for your input. Unfortunately, I don't have much of a response because I'm not really involved (at least directly) in the new GEDCOM specification. My comments above are all within the scope of the GEDCOM X project.