FamilySearch / gedcomx

An open data model and an open serialization format for exchanging genealogical data.
http://www.gedcomx.org
Apache License 2.0
356 stars 67 forks source link

Events vs Facts #208

Closed nilsbrummond closed 11 years ago

nilsbrummond commented 12 years ago

Is there supposed to be a connection between events and facts? I don't see any right now.

As an example of what I'm talking about:

It seems that in these types of examples the Event and related Facts/Relationships/etc should be composite. That one can not exist without the other.

mikkelee commented 12 years ago

I was thinking about the relationship between Facts and Events the other day as well.

It seems to me that Events all (or most, anyway) change the state of some Fact (Birth/Death change "Living", Marriage/Divorce change "Married", MoveFrom/MoveTo change "Location" or "Residence", etc). So I don't see a need to model both, at least in the cases where they correspond as above.

Imagine recording events for some person:

In essence only requiring one "Event"-type, making it the job of the user/software to deduce the Facts implicitly specified by the Events.

I don't know how practical it would be, I've only just started thinking about it, but thought I'd float the idea.

jralls commented 12 years ago

I see it a bit differently, that something is either an Event (e.g., marriage or death) or a Fact (e.g. married or dead). Knowing the event's details certainly implies the fact, but knowing the fact (from a census record, perhaps) doesn't necessarily provide enough details to record an event.

That said, FS Management has apparently decreed that the way it is in the spec is the way it's gonna be.

mikkelee commented 12 years ago

Say a census done in 1900 says that a John Doe is 20 years old and married; you would record the events as:

Can you provide an example on when knowing a fact wouldn't let you extrapolate an event (even with vague terms as above)?

Though I guess it is a bit futile to discuss if it's set in stone...

jralls commented 12 years ago

Say a census done in 1900 says that a John Doe is 20 years old and married; you would record the events as:

  • Born about 1880
  • Married before 1900 Using the census as the source.

You could certainly infer those events, though I'm not confident enough in census data that I would. The 1900 US Census actually records a month and year of birth and the number of years married, but most censuses don't, and in any case the source of that information isn't known. It's more accurate to record as "extracted Facts" that John was married and age 20 in 1900. That can be done directly without an intervening proof argument, and will be one of the sources for the proof argument in which you document the inferred event.

Can you provide an example on when knowing a fact wouldn't let you extrapolate an event (even with vague terms as above)?

For what value of "knowing"? A single census entry is a pretty low bar, even if after a reasonably exhaustive search it's the only record you can find. Sure, you can infer from that, but should you? Why not just write "John Doe was enumerated in the 1900 Census in Anywhere, Some County, aged 20 and married to Jane ____" if that's all you've got?

But the real issue is that lots of people don't do the reasonably exhaustive search. They find the one record, infer everything they can out of it, and move on. They don't even find the other John Doe two pages later in the same census, with a wife and kids with the same names but a different age.

mikkelee commented 12 years ago

Say a census done in 1900 says that a John Doe is 20 years old and married; you would record the events as:

  • Born about 1880
  • Married before 1900 Using the census as the source.

You could certainly infer those events, though I'm not confident enough in census data that I would. The 1900 US Census actually records a month and year of birth and the number of years married, but most censuses don't, and in any case the source of that information isn't known. It's more accurate to record as "extracted Facts" that John was married and age 20 in 1900. That can be done directly without an intervening proof argument, and will be one of the sources for the proof argument in which you document the inferred event.

Can you provide an example on when knowing a fact wouldn't let you extrapolate an event (even with vague terms as above)?

For what value of "knowing"? A single census entry is a pretty low bar, even if after a reasonably exhaustive search it's the only record you can find. Sure, you can infer from that, but should you? Why not just write "John Doe was enumerated in the 1900 Census in Anywhere, Some County, aged 20 and married to Jane ____" if that's all you've got?

What I mean is that you record those events with the census as a source (you "know" that the census says these things), and attribute them with whatever confidence you have. Other sources you find may or may not contradict the events you've recorded with greather or lesser confidence.

That would be the same, though, whether you were recording facts or events - you'd still be pointing each to some source with some level of confidence.

Conflicting events/facts would then need to be aligned. I see that the model doesn't have conclusions based on other conclusions currently; I thought there was talk of doing that? If there were, and I had a person for which I had recorded marriage data from a number of sources, some of which contradicted, I would create a "top-level marriage" event, using the conflicting conclusions as "building blocks" to reach an internal consensus based on the available evidence.

This would allow me to redefine the "top-level marriage" as my confidence levels change; for example a new source introduces corroboration for a previously less confident source.

But the real issue is that lots of people don't do the reasonably exhaustive search. They find the one record, infer everything they can out of it, and move on. They don't even find the other John Doe two pages later in the same census, with a wife and kids with the same names but a different age.

No model is going to alleviate that, though requiring attributed sources will at least help others verify the events/facts.

jralls commented 12 years ago

Conflicting events/facts would then need to be aligned. I see that the model doesn't have conclusions based on other conclusions currently; I thought there was talk of doing that?

It's there, but a bit roundabout: You write a SourceDescription for each intermediate conclusion and then you can use SourceReferences to link them to new conclusions.

What I mean is that you record those events with the census as a source (you "know" that the census says these things), and attribute them with whatever confidence you have.

What I mean is that you record the facts that are what the census actually says and then document the inferences that you make from that as what they are.

mikkelee commented 12 years ago

It's there, but a bit roundabout: You write a SourceDescription for each intermediate conclusion and then you can use SourceReferences to link them to new conclusions.

Ah, thanks. I assume it's done by using the "about" field of the SourceDescription, pointing that to my intermediate Conclusion.

What I mean is that you record the facts that are what the census actually says and then document the inferences that you make from that as what they are.

I guess I'm also with Nils in being a little confused about the exact relationship between Facts and Events, then. They seem to me to be operating with redundant data with no direct connection between them, which is a little scary to me.

stoicflame commented 11 years ago

Given the discussion that's happening at threads like #235 and #236 I'd like to see if we can get the concerns raised here addressed.

I think the answer @jralls gave was pretty accurate:

something is either an Event (e.g., marriage or death) or a Fact (e.g. married or dead). Knowing the event's details certainly implies the fact, but knowing the fact (from a census record, perhaps) doesn't necessarily provide enough details to record an event.

But then he seemed to imply that the spec doesn't reflect that very well. So how does the spec need to be enhanced to provide the needed clarity? What if we added the above verbage in the definition of the Event data type?

Then the other concern that was raised is that there was no way to directly associate a Fact with an Event. I guess I don't share the concern, but it's probably because I'm ignorant. @nilsbrummond and/or @mikkelee can you elaborate why this is a concern?

mikkelee commented 11 years ago

Well, I have only just recently realized that Facts are basically just properties on a person, and Events are stand-alone entities. If this is not in fact true, you may disregard the below.

I can imagine a use for both, but I'm not sure I see the need for both. Regardless, my workflow in some hypothetical ideal genealogy application would be something like:

-- do this multiple times for multiple sources --

In this workflow I don't see a need to differentiate facts and events as currently defined. If facts/events have identifiers they can be linked to some arbitrary entity online regardless of their offline state anyway.

Are there ever events so far abstracted from their actual source that they would no longer be tied directly to the 1st-order persona from that very same source? The source and the personas and the events/facts are all part of the same "bundle". I can't conceive of an event independant of its participants. I think that is how I can best explain what I mean.

jralls commented 11 years ago

In this workflow I don't see a need to differentiate facts and events as currently defined. If facts/events have identifiers they can be linked to some arbitrary entity online regardless of their offline state anyway.

And yet you did in your example:

create persona(s) as source describes. A has a name and an occupation. B has a name, C has a name.

"Occupation" being an example of Fact.

create source-persona-event: C was born on date of source.

Are there ever events so far abstracted from their actual source that they would no longer be tied directly to the 1st-order persona from that very same source? The source and the personas and the events/facts are all part of the same "bundle". I can't conceive of an event independant of its participants. I think that is how I can best explain what I mean.

It has nothing to do with abstraction from the source, it has to do with what is in various sources and how you combine the information in those sources to arrive at a conclusion. Consider that you find on an ancestor's grave marker "killed at battle of Jena". That's a fact the implies an event. You go to a history of the Napoleonic wars and find that the battle of Jena occurred 14 October 1806. The history book doesn't mention your ancestor. That's an Event with no personas. Combining that Event and the earlier Fact allows you to infer a death Event for your ancestor with a date and approximate location.

mikkelee commented 11 years ago

Alright, I think I get it.

So a birth record could show a birth event, and possibly a married fact about the parents. Suppose you gather multiple "married" facts about the parent couple, but are unable to locate their marriage. Currently, I've handled that in Gedcom by having the FAM.MARR.DATE be BEF yyyy with an attached NOTE -- obviously not ideal. In GedcomX it seems I would create a marriage event, base it on all the marriage facts, with an analysis doc between?

Thanks for the patience :)

jralls commented 11 years ago

In GedcomX it seems I would create a marriage event, base it on all the marriage facts, with an analysis doc between?

Yes, that's the idea.

nilsbrummond commented 11 years ago

@mikkelee

So a birth record could show a birth event, and possibly a married fact about the parents. Suppose you gather multiple "married" facts about the parent couple, but are unable to locate their marriage. Currently, I've handled that in Gedcom by having the FAM.MARR.DATE be BEF yyyy with an attached NOTE -- obviously not ideal. In GedcomX it seems I would create a marriage event, base it on all the marriage facts, with an analysis doc between?

I would look at it as your multiple "married" facts would be extracted evidence from sources. Your conclusion would still be a single "married" fact in the form of Fact(type: married, place: null, date: from before to after ). There would be no marriage event without known directly about the wedding event itself. If you found the wedding record then you would add a "marriage" event and change the married fact to start of the appropriate date. On this I am assuming the multiple "married" facts are things like "where currently married at the give date of some other fact/event", so giving use information about the ongoing state of the marriage.

jralls commented 11 years ago

I think that it's a usage preference whether or not to create an inferred event in the database. What's important from the POV of GedcomX is that it's possible, and if done the result can be clearly identified as inferred with a trail of evidence supporting the inference.

mikkelee commented 11 years ago

Yeah, agreed. I was just going to say that whether the inferred conclusion is an event or an fact is probably ultimately up to taste. As long as it's clearly visible on what grounds the conclusion was made, that's what matters to me. Thanks to both of you for clarifying!

stoicflame commented 11 years ago

I wanted to keep this open until we added a specific section in the conceptual model that clarifies the difference between the two concepts. I've done that at ef0fe00. Thank you for your participation.