FamilySearch / gedcomx

An open data model and an open serialization format for exchanging genealogical data.
http://www.gedcomx.org
Apache License 2.0
355 stars 67 forks source link

clarify model guidelines for marriage fact without a spouse #104

Closed stoicflame closed 12 years ago

stoicflame commented 12 years ago

A request for how to model the following use case was posted by @thomast73:

The above record has a “Marriage” Event. However, the record only contains a one Person; also the record does not contain any Relationships.

Our initial thought was that we should manufacture an additional Persona, then create a “couple” Relationship between the existing Persona and the manufactured Persona, then convert the Event to a Fact and associate the Fact to the newly created Relationship. This strategy ensures that any “marriage-like” events are associated with a Relationship, but does produce a Person with no real genealogically-significant data.

Alternatively, if we encounter a record that contains a “marriage-like” Event but the record does not have the requisite CoupleRelationship and/or two Person participants, we could just convert the Event to a Fact and associated it the one Persona that we did find in the record. Using this strategy means that “marriage-like” events can appear in more than one place in the record; the converted record will still not have any Relationship. This feels a bit funny to me.

What would you suggest we do with such a record?

stoicflame commented 12 years ago

@carpentermp posted the following in response:

This is an old dilemma and, given that all approaches have drawbacks, one that has always had people on different sides. There are actually 3 possibilities:

  1. Create a "dummy person", create a couple relationship between the person and the "dummy person", put the marriage on it.
  2. Create a couple relationship with only 1 person involved (leaving the "other side" null), put the marriage on it.
  3. Put the marriage on the lone person.

I personally hate "dummy persons". This is what they did in CP and it created lots of little messy problems. I also hate "relationships to nowhere"--they seem like a logical contradiction and they are probably more problematic than dummy persons. Because of this, I favor putting the marriage event on the lone person. Yes, it means that you have to look in two places for marriage events, but that is really the only downside. The existence of a marriage on a person expresses exactly what is intended--that such-and-such a person was married on a given date and in a given place--without any knowledge of to whom. (If you stop and think about it, neither of the other two options expresses exactly this.)

As I said, there differing opinions on this, but that is mine. I believe this decision really belongs to GedcomX in general, and not just to the Record profile. The Conclusion profile has the exact same dilemma, and it would seem like a good idea to be consistent. Since all implementers will be faced with this dilemma, it also seems like something that has to be documented, at the very least.

stoicflame commented 12 years ago

@dkohlert posted the following in reponse:

I actually would favor option 2) a couple relationship with only 1 person.

stoicflame commented 12 years ago

To which I responded:

I'm okay with either the "relationship to nowhere" approach or the "marriage on a person" approach.

stoicflame commented 12 years ago

To which @ranbo responded:

Me, too. I tend to want to avoid using "dummy" persons except in situations where you need to distinguish one from another. For example, given the sentence: "Fred Jones married in 1820, had two sons, Bob and Jim. He remarried in 1840 and had a daughter Sally."

In that case, there are two unknown spouses, and we know something about each one, namely, the marriage date and which children went with that spouse. In that case, you're starting to get enough information that it's worth creating a dummy person to represent "the spouse Fred Jones married in 1820 who was also the mother of Bob and Jim", since two "nulls" would be ambiguous.

But when all you've got is a marriage date, I wouldn't create a dummy person yet.

As for #2 or #3, it depends on whether you want to think "He was married to someone on that date, but we're not sure who yet", or if you want to think "He was married on that date" [and we're not saying anything about the relationship]. Either way is fine with me.

My guess is that even with documentation, the standard can't avoid people putting marriage events on a person, so our software may have to look both places anyway, at least on import. We could decide, though, that we'll force it onto a relationship on import so that in OUR system there are no couple events on persons, which would let our client avoid having to look.

Again, I'm fine either way.

stoicflame commented 12 years ago

To which @jeffph responded:

@carpentermp has convinced me. I’m in favor of option 3. I would rather enforce relationships having two personas than marriage facts only existing on relationships. Other fact types will be in all three places anyway.

Also, we already have helper methods in our RecordNavigator class to traverse all three locations and return Facts of a given type, such as Marriage. However, if we allow this, we should probably at least add a check/warning on marriage records when two people are in a couple relationship and both have the same marriage Fact data (i.e. same date/place) on their respective personas, and not on the relationship.

stoicflame commented 12 years ago

It seems like most people are in favor of making the official recommendation be to create a marriage fact on the person over creating a "one-sided" relationship or "dummy person". I'd like to get comments from the following:

And I'd like @dkohlert to elaborate as to why (an how strongly) he prefers a one-sided relationship. If we're all okay with recommending a marriage fact on a person, then I'll make the call and write it up as an official recommendation.

carpentermp commented 12 years ago

I would also point out that, while we have been talking as though this is a question only for "couple facts" it is actually common to all "relationship facts", for example, the parent-child fact "Adoption". We may know who adopted the child, or we may not.

carpentermp commented 12 years ago

Allow me to try to make a stronger case against "relationship to nowhere". The obvious downside to this approach is that, whenever you are dealing with a relationship you have to check for null. The advantage over "dummy person" would seem to be that you don't have to create the dummy person. However, you have, in effect, created a "dummy relationship."

An additional disadvantage comes to light when you consider the "Relationship uniqueness constraint." Not everyone believes in this constraint, and some believe that it ought to be left to the implementer whether or not to enforce it, but let me define it and then I'll show where that leads in any system that has it. Basically, the "Relationship uniqueness constraint" (RUC) states that there can only be 1 Relationship of the same type between the same two people. For example, suppose Joe married and divorced Sue three times. In an RUC system, there would only be 1 relationship between Joe and Sue and all the marriage and divorce facts would reside on it. This is fundamental to how Relationship is defined. In RUC systems, the existence of a Relationship between two people indicates only that such a relationship existed--it deliberately says nothing about how it came to be, or how long it lasted--Relationship Facts serve that purpose. In RUC systems, a Relationship is a binary condition--it either existed or it didn't. Thus, to violate the constraint makes no logical sense.

So why have a "Relationship uniqueness constraint"? This is tangential, but in order to appreciate the additional drawback to "relationship to nowhere", it is probably necessary to at least understand where RUC advocates (of which I am one) are coming from. RUC has its roots in the "mental model." When users think of relationships, do they think of them as independent entities, or do they tend to define them in terms of the people involved? When drawing lines between people in a pedigree, would it surprise a user to see more than 1 line between the same two people? When listing the children of a person, would it surprise the user to see the same child listed twice?

The RUC naturally produces system behavior that is in keeping with the "principle of least astonishment" to our users. For example, suppose person "A" has a couple relationship with persons "B" and "C." Suppose the user realizes that "B" and "C" are actually the same real person, and merges them. In an RUC system, the couple relationships would automatically merge--no other behavior is possible. Without the constraint, both relationships would continue, or the user would be asked if they should be merged. The question will probably be confusing. Suppose the user says "no", thus leaving both relationships. Will the next user be confused to see two spouses that are really the same person?

Anyway, there's lots more I could say on this subject but it would be too much of a digression from my original point--the additional drawback to "relationship to nowhere" in RUC systems. If the RUC is strictly enforced, then there can be only one "relationship to nowhere" for a given person and relationship type. (You could relax the constraint for "dummy relationships" but that has another set of problems.) For example, "Bob" can only have 1 "unknown spouse." Suppose we have 2 marriage dates for Bob but no other information about the marriages. We would be forced to put both marriage facts on the same "dummy" relationship. This would seem to imply that both marriages were to the same person, which is probably not the case. The alternative would be to create a "dummy person" for the second marriage, but then we find ourselves in the "dummy person" world after all.

The problem began when we created the "dummy relationship" in the first place.

dkohlert commented 12 years ago

Merlin, Excellent write up. I do get the RUC principle. I haven't thought about it too deeply but it does make sense so I will not disagree with the principle.

I will perhaps argue that a relationship that has only one person in it and is thus, as you describe it as a relationship between a person and "nowhere", that "nowhere" does not fall into the RUC principle as "nowhere" does not represent a single person. "Nowhere" could represents all persons or no person at all depending on how you look at it. Eventually, two relationships to "nowhere" may be shown that those particular two "nowhere" instances do indeed represent the same person. When that does happen, the two relationships would be merged into one as you mentioned thus following the RUC constraint. If however, the two "nowhere"s end up being different persons then the RUC principle is again followed.

Let me explain why I think creating such relationships is somewhat cleaner in terms of a consistent model, especially on the conclusion side. If you put the marriage fact onto the one known person and then later on discover the spouse, then you have to move the marriage fact from the person to the newly created relationship, this implies that each time you are about to add a fact to a relationship, you first have to make sure that a similar fact is not already on one of the persons participating in that relationship.

If the relationship were created from the start, then all that would be necessary would be to add the new person the record and add them to the relationship. No other changes to the model are needed, though you would have to check and see if the relationship already existed.
This scenario is much less likely to happen in the record world but I could happen in a record that is initially only partially indexed for some reason or another such as poor image quality etc.

I think I am fine with either approach, but I think, in terms of implementation, that putting the fact on a single person relationship would be easier to present in the conclusion tree allowing a logical place to fill in the other person when it becomes available. As well as minimizing structural changes to records.

-----Original Message----- From: carpentermp [mailto:reply@reply.github.com] Sent: Thursday, December 22, 2011 2:18 PM To: Doug Kohlert Subject: Re: [gedcomx] clarify model guidelines for marriage fact without a spouse (#104)

Allow me to try to make a stronger case against "relationship to nowhere". The obvious downside to this approach is that, whenever you are dealing with a relationship you have to check for null. The advantage over "dummy person" would seem to be that you don't have to create the dummy person. However, you have, in effect, created a "dummy relationship."

An additional disadvantage comes to light when you consider the "Relationship uniqueness constraint." Not everyone believes in this constraint, and some believe that it ought to be left to the implementer whether or not to enforce it, but let me define it and then I'll show where that leads in any system that has it. Basically, the "Relationship uniqueness constraint" (RUC) states that there can only be 1 Relationship of the same type between the same two people. For example, suppose Joe married and divorced Sue three times. In an RUC system, there would only be 1 relationship between Joe and Sue and all the marriage and divorce facts would reside on it. This is fundamental to how Relationship is defined. In RUC systems, the existence of a Relationship between two people indicates only that such a relationship existed--it deliberately says nothing about how it came to be, or how long it lasted--Relationship Facts serve that purpose. In RUC systems, a Relationship is a binary condition--it either existed or it didn't. Thus, to violate the constraint makes no logical sense.

So why have a "Relationship uniqueness constraint"? This is tangential, but in order to appreciate the additional drawback to "relationship to nowhere", it is probably necessary to at least understand where RUC advocates (of which I am one) are coming from. RUC has its roots in the "mental model." When users think of relationships, do they think of them as independent entities, or do they tend to define them in terms of the people involved? When drawing lines between people in a pedigree, would it surprise a user to see more than 1 line between the same two people? When listing the children of a person, would it surprise the user to see the same child listed twice?

The RUC naturally produces system behavior that is in keeping with the "principle of least astonishment" to our users. For example, suppose person "A" has a couple relationship with persons "B" and "C." Suppose the user realizes that "B" and "C" are actually the same real person, and merges them. In an RUC system, the couple relationships would automatically merge--no other behavior is possible. Without the constraint, both relationships would continue, or the user would be asked if they should be merged. The question will probably be confusing. Suppose the user says "no", thus leaving both relationships. Will the next user be confused to see two spouses that are really the same person?

Anyway, there's lots more I could say on this subject but it would be too much of a digression from my original point--the additional drawback to "relationship to nowhere" in RUC systems. If the RUC is strictly enforced, then there can be only one "relationship to nowhere" for a given person and relationship type. (You could relax the constraint for "dummy relationships" but that has another set of problems.) For example, "Bob" can only have 1 "unknown spouse." Suppose we have 2 marriage dates for Bob but no other information about the marriages. We would be forced to put both marriage facts on the same "dummy" relationship. This would seem to imply that both marriages were to the same person, which is probably not the case. The alternative would be to create a "dummy person" for the second marriage, but then we find ourselves in the "dummy person" world after all.

The problem began when we created the "dummy relationship" in the first place.


Reply to this email directly or view it on GitHub: https://github.com/FamilySearch/gedcomx/issues/104#issuecomment-3254989

NOTICE: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.

carpentermp commented 12 years ago

Unfortunately, this takes us into even deeper waters where there is still fundamental difference of opinion about the definition of Relationship in the model. (Incidentally, I created issue #7 to have a place to discuss this and it is the oldest open issue on the board.) There are two main camps:

  1. Relationships are independent entities.
  2. Relationships are defined by (and identified by) the two people involved and the relationship type.

Some people have staked out positions somewhere between these two extremes, and others take the opinion that most of the differences between the two approaches can be regarded as implementation-specific. Personally, I believe that the differences are so fundamental that we will never achieve much interoperability between systems unless a very clear position is taken.

Anyway, from the point of view of position 1, your approach of adding a person to the other side of the relationship is perfectly reasonable. However, from position 2, it is not possible, because to change the people involved in a relationship is to create a new relationship. This constraint is more than just slavish adherence to the principle, but is fundamental to how things work in that model.

I have decided that I probably need to articulate the differences between these approaches so that the community is at least aware of the issues. I plan to do this as a new post to issue #7. Perhaps that will get the issue out of the doldrums to where it can be resolved.

ranbo commented 12 years ago

Another way of stating what @carpentermp said earlier is that a Relationship object contains everything we know about the relationship between two people (e.g., they were married in 1820, divorced in 1840, married again in 1863). Changing a person at one end or the other of the relationship makes it so that it is now talking about a different relationship between two other people. Sometimes we call this "hijacking" the relationship (similar to changing the name, gender and events on a Person so that it now represents a different real person).

(It is a bit like saying "I have George Washington's original hatchet that he used to chop down the cherry tree. Sure, it has has 2 new heads and 8 new handles, but it's the same hatchet!" Or, like Steven Wright's opening comment: "The other day I...no, wait, that wasn't me.")

So if we model a marriage to an unknown spouse with a relationship with a null spouse, and then later we discover the spouse, then (at least using "camp #2" above) we are really creating a new relationship between the person and the discovered spouse, and adding the marriage event to it, and perhaps deleting the old relationship with the null spouse. In that case, it's certainly no more work to move a marriage fact from the person to the newly-created relationship than it is to make the other changes.

None of the solutions are perfect, but I've come around to thinking that putting the marriage right on the person would be the cleanest way to go.

tcreighton commented 12 years ago

I don't know if it is too late to weigh in on this, but here goes: I tend to see things as @dkohlert has been arguing. If you think about it from a relational database perspective, having a null value for an attribute does not allow you to match on that across rows. For example, if person A has two relationships defined, B, and C, each having a null value for the other person, this does not mean that they are the same relationship, and RUC is not breached. We certainly need to decide how to handle the case when a person changes on one side of the relationship. One approach is to disallow such a thing. This requires the user to define a new relationship and add facts on that relationship. Another is to allow the change but make sure the user considers each and every fact already on the relationship in terms of the new persons. I tend to lean toward the former approach. Either way, it is not the same as defining a person to replace null. When we have a null value, it is simply not there. This does not seem to break the concept of RUC at all so long as you allow that the primary key is a pseudokey. I would be comfortable with a system that allowed a single null value on a releationship. Once that has been changed to a real value then I would be careful in allowing changes to it. However, even allowing changes to one of the sides doesn't bother me too much. What is important as I see it, is that we have only one relationship instance for a given two values of person. If one of the person values is null then that rule can be maintained for however many relationship objects are created with the same value for the other person.

stoicflame commented 12 years ago

I think we've allowed adequate time now for those who want to contribute to the discussion to do so. We've heard some good points for both sides, but I think the majority would prefer the recommendation to add a marriage fact to a person. And even those who prefer the "one-sided relationship" don't seem to have serious objections to the majority opinion.

Is that a fair analysis of the situation? Assuming yes, I'll add the official recommendation to the documentation and close up the issue.