FamilySearch / gedcomx

An open data model and an open serialization format for exchanging genealogical data.
http://www.gedcomx.org
Apache License 2.0
354 stars 68 forks source link

Adding an identifers list Relationship ... and possibly Event and SourceDescription? #231

Closed thomast73 closed 11 years ago

thomast73 commented 11 years ago

In developing the GEDCOM X model, we have needed to associate instances of an object with other data -- either because that other data is another representation of this data (e.g., other aliases, or a representation in an external system), or because the can be grouped with other items by a common abstraction (e.g., several PlaceDescription instances that describe the same place). In the three cases such issues have been directly addressed in the GEDCOM X model -- in Person, PlaceDescription, and Agent -- we have added an identifiers list to meet these needs.

We have recently identified some similar needs for instances of Relationship as it will be in used our FamilySearch Platform API (e.g., persistent id, deprecated ids resulting from a merge, etc.) We are, therefore, proposing we add an identifiers list to the Relationship definition.

We can also see the possibility of similar needs for the Event and SourceDescription objects -- needs that could be addressed by adding an identifiers list to these object definitions.

We would like to solicit any thoughts you have on adding identifiers to Relationship?

We would also like to know if you have strong feelings about whether identifiers might be needed in Event and/or SourceDescription?

EssyGreen commented 11 years ago

Personally I believe that an identifier is useful since it indicates that one object is different from another even tho' it might appear not to be by looking at its data. ... However, I find having to manage lists of identifiers nullifies this benefit and instead adds complexity because whenever we are looking at a particular object we need to check for any other object of the same type which might also have the same identifier to get the full picture.

So I'd like to ask why we have to have a multiplicity of identifiers for any particular object?

mikkelee commented 11 years ago

So I'd like to ask why we have to have a multiplicity of identifiers for any particular object?

I believe the original motivation for the identifier lists is to be able to associate a given entity with multiple authorities & repositories. So a Person in your database can have a FamilySearch identifier, a Geni identifier, a RootsWeb identifier, etc. A Place can have a DigDag identifier, an OpenStreetMap identifier, etc.

The motivation behind the use of identifiers as normalization points was discussed in #79 & #220 for PlaceDescriptions, where the identifiers afaik so far are intended to be used to say that multiple PlaceDescriptions refer to the same canonical place.

Please correct me if I'm wrong!

This then also makes it possible to have two instances of the same historical person in your dataset, which are linked by their identifiers. Is this an intended consequence? That almost makes it look like n-tier conclusions (which I like the idea of a lot).

stoicflame commented 11 years ago

@mikkelee is correct. Thanks for your response. And, yes, we intend to use identifiers to support n-tiered implementations, but we still have some work to codify that.

@EssyGreen, there is a difference between the id property, which is a single-value property defined as you describe (sometimes referred to as a "fragment identifier"), and the identifiers list, the which purpose is as @mikkelee describes.

mikkelee commented 11 years ago

Thanks!

Thinking on it some more, I worry a bit that using identifiers for n-tier hierarchies of entities is somewhat inflexible without a confidence associated. Say I have person instances A, B, C, & D. I am certain that A & B are the same historical person, and that C & D are the same. But I'm not fully certain that AB and CD are the same (perhaps there is a gap of several years between the sourcers for AB and CD)? There is no way to mark this, as far as I can tell.

Not sure this is correct for this issue or not, but I'm putting it here because it seems to be mainly about the use of identifiers.

stoicflame commented 11 years ago

I might be onto something there... I need to think about it some more.

But, as you say, not on this thread.

EssyGreen commented 11 years ago

Thanks for correcting my understanding @stoicflame . If the identifiers is just an optional list of ad hoc properties which is already used for other entities then I guess it doesn't make it any worse if they are added for any other entity. However, be aware that many systems will not want to cater for such a granular level of objects so I believe it is not likely to be well supported.

thomast73 commented 11 years ago

We have added identifiers to Relationship and the accompanying changes were checked in as 292c5cfc0f2aa8a354668977886c09ccb914fbea.