FamilySearch / gedcomx

An open data model and an open serialization format for exchanging genealogical data.
http://www.gedcomx.org
Apache License 2.0
356 stars 67 forks source link

support for modeling "negative" statements #127

Closed stoicflame closed 11 years ago

stoicflame commented 12 years ago

There exists today a way to use the EE certainty vocabulary to say things like:

But there isn't a vocabulary element defined for saying things like:

EssyGreen commented 12 years ago

And more importantly this Person is not this Persona :)

EssyGreen commented 12 years ago

The simplest way to achieve this seems to be to change your ConfidenceLevel to allow for negatives.

As I indicated in Issue #120 I think a (say 5-point) numeric scale is more flexible and accurate at the data level than a pre-defined enum.

EssyGreen commented 12 years ago

Also please steer away from using a particular source ... "Evidence Explained" is only one person's view. There are others.

jralls commented 12 years ago

"Evidence Explained" is only one person's view. There are others.

There may be other views (there always are), but Evidence Explained is not "one person's view". It reflects the accepted practice among professional genealogists in the United States and the standards of the Board for the Cerifcation of Genealogists (BCG) . Until professional genealogists in other countries form standards bodies like the BCG and promulgate standards for use in their respective countries, we have no other guidance to follow.

EssyGreen commented 12 years ago

I correct myself I should have said one view not one person's view.

That said, you have just confirmed yourself that Evidence Explained is US-based. In the UK we have other bodies of genealogical standards (e.g. FFHS or IHGS - incidentally the latter was founded 3 years earlier than BCG) and many official repositories (e.g. National Archives) offer free advice on how to cite from their sources - which given that the primary aim of a citation is to enable the reader to find the source for themselves seems eminently sensible.

Even if a particular book were to be accepted on a global scale ... what happens when another comes along? (And rest assured it will, sooner or later)

jralls commented 12 years ago

Yes, Mrs. Mills is an American. So am I, so is Ryan, and so are the executives at FamilySearch, Ryan's employer. So what?

I've looked through the websites of the two organizations and found nothing about standards, evidence evaluation, or citations. The SoG have a cursory list of "standards" which is too general to be of any use to anyone. Googling "uk genealogy standards" didn't turn up anything useful, and neither of the English genealogy books in my library (Ancestral Trails and The Oxford Guide to Family History offer more than perfunctory guidance on the subject.

So, if not EE, then what would you recommend? What do you use?

EssyGreen commented 12 years ago

I'm not complaining about you being American! I just thought GEDCOMX was intended for a wider audience. If not, just say so and I'll be off (since, as you've probabIy guessed by now, I'm in the UK).

What do I use? Mainly common sense :P When I find a source, I document everything I might need to find it again (and take a digital copy if at all possible or a text or handwritten one if not). When I cite it I quote the same stuff back. If I find a derivative I try to get the original, if not (or it's too expensive) I document that it isn't the original and how accurate a reproduction it's likely to be (e.g. whether photographic, transcription, interpretation etc). I tend to conform to Chicago style in how I format the citation text but I don't stick rigidly to it if I have a source that doesn't fit the templates.

I have tried the templates implemented from EE in FTM but invariably found that they just didn't fit with the way sources are arranged over here - and it was like trying to shoe-horn things into a place they didn't fit. I don't have anything against ESM or EE but it doesn't work for me and I don't think I'd be the only person to say that and hence, I wouldn't personally build her templates into my applications (whatever GEDCOMX might decide).

I believe that it's the principles which are important when citing and not the specific syntax (tho' I appreciate that templates can be useful guides). The key principles for me are to document (a) how to find the original source (b) how strong/weak I believe my evidence to be that the person in the source is the person in my tree. The quality of the source itself I am much less worried about since I believe it is up to the reader to judge whether or not they will agree with my own judgement/interpretations. As a genealogist I would never just believe what someone else has written (as a secondary source). They could have put "Certain" for a citation but I will always go look up the source myself and make my own judgement. Having said that I will specify "primary" or "secondary" so that others know that I have a decent copy they can view/borrow/ask about.

I'm sure mine isn't the only way and I'm not trying to sell it but please let's keep a more open standard than that from a single book.

jralls commented 12 years ago

I don't set policy here -- that's up to Ryan's bosses. But Robert Raymond, a FamilySearch exec (assistant of some sort to David Rencher, FS's Chief Genealogist) said in a lecture at RootsTech last week that "Evidence Explained is the standard".

since, as you've probabIy guessed by now, I'm in the UK

Your use of "whilst" gave that away in your first post. ;-)

Sounds like you're following Mrs. Mills's advice rather closely, actually. It also seems that your problem isn't with EE per se but with the way the templates are implemented in FTM -- also an American company, and one with a rather more parochial viewpoint than FamilySearch. The templates in the book are examples. There's plenty of text explaining the principles behind them (available in an earlier and less expensive form in her preceding book Evidence! ). Pretty much every book or article I've seen on the subject of genealogical evidence for the last 10 years refers to one or the other. Admittedly, they've all been American -- but I just searched Blackwells several ways and didn't find any English ones.

Going back to your issues with FTM, I see that more as an implementation problem of FTM. I haven't used that program since the Brøderbünd days, but if it's inflexible as you say then the problem is likely that it uses fixed-field tables rather than Key-Value ones that allow the user to customize easily the fields that are used for a citation. Have you ever tried The Master Genealogist?

Open in what way? For citations in GedcomX, the problem will be standardizing a vocabulary of element names for the many parts of a citation, and designating which should be mandatory. It's up to applications to parse those element names into their own data models and to format those fields into footnotes or endnotes in reports. EE makes a start at cataloging the element names and formats, but can't cover every archive in the world and freely admits that. Mrs. Mills has done an enormous amount of work making that start and it would be dumb to ignore it.

EssyGreen commented 12 years ago

If I am following Mrs Mills advice it's coincidental since I've been doing this since before she (or at least the Evidence series) was published. Like I said I follow my brain :)

I don't use FTM or Master Genealogist or the other big selling applications out there but I have tried them all land keep tabs on them to see how they are evolving. I begrudgingly use Family Historian but the frustrations of using it (and not finding any viable substitutes) have driven me to create my own application (still in the making I'm afraid since this isn't my day job).

"Open in what way?" Like I said at the start my preference is for a number (which can be negative as well as positive). If a particular application wants to overlay this with a particular set of words then fine but why take up more space and allow for more errors in the file by using a set of (English) words?

[quote]EE makes a start at cataloging the element names and formats, but can't cover every archive in the world and freely admits that. Mrs. Mills has done an enormous amount of work making that start and it would be dumb to ignore it.[/quote] Neither ESM nor anyone else will ever be able to catalogue all the necessary formats. It's a tail chasing exercise ... what genealogists need to be able to do is use the principles to enable people to follow the trail.

PS: How the heck do you do block quotes in this thing?

jralls commented 12 years ago

I begrudgingly use Family Historian but the frustrations of using it (and not finding any viable substitutes) have driven me to create my own application (still in the making I'm afraid since this isn't my day job).

Have you looked at Gramps? It's incredibly flexible and open-source. If it doesn't work the way you want you can "fix" what you need to, which saves you the work of writing all of the other stuff you'd need when starting from scratch. Source handling in particular needs some work.

"Open in what way?" Like I said at the start my preference is for a number (which can be negative as well as positive). If a particular application wants to overlay this with a particular set of words then fine but why take up more space and allow for more errors in the file by using a set of (English) words?

Um, did you lose track of which issue you're on? This one is negative assertions, but we got distracted to citations. I don't understand how a number would represent either -- but it makes some sense for #130, Dates Shouldn't Have Parts.

EE makes a start at cataloging the element names and formats, but can't cover every archive in the world and freely admits that. Mrs. Mills has done an enormous amount of work making that start and it would be dumb to ignore it

Neither ESM nor anyone else will ever be able to catalogue all the necessary formats. It's a tail chasing exercise ... what genealogists need to be able to do is use the principles to enable people to follow the trail.

Exactly.

PS: How the heck do you do block quotes in this thing?

Put a ">" at the beginning. It breaks on an empty line, so for the two-level quote above I used ">>" for the first one, an empty line, and ">" for the second. For the rest of the markup, look at the top right corner of the edit box. See the link "Github Flavored Markdown"? Click on that (javascript required) and it will pop up a window with a cheat-sheet.

EssyGreen commented 12 years ago

Yes I've looked at Gramps. I did think of tweaking it but decided it was easier to start from scratch - especially since (horror of horrors I know) I prefer .Net :)

did you lose track of which issue you're on?

Er yes we did deviate somewhat but as I keep banging on, a number does negatives quite easily :) I realise there are other parts of the model which also need amending to make negative evidence effective but I hope I've covered those elsewhere - see Issue #120

The heading here is for negative "statements" - I assumed this was closer to citations. I can't see how a negative assertion would work - Isn't that like saying "This person didn't contribute to this record"? In which case it would create a heck of a lot of negative statements for each GR!

PS: Thanks for the blockquote tip! Been bugging me since I joined! I did try the help but was somewhat put off by it being entitled a "Cheat Sheet" hehe

jralls commented 12 years ago

We need Ryan to clarify what he's talking about, because you think he's talking about records and I think he's talking about conclusions.

EssyGreen commented 12 years ago

Absolutely! (Tho' I think/hope he's talking about the link between records and conclusions) I'm guessing he's gone on hols after RootsTech - either that or he's buried under a mound of code trying to sort out the next batch.

stoicflame commented 12 years ago

Hi guys. Sorry for the response delay. The delay is usually due to being buried. We're working on ramping up our resources so I'm not the only one responding here.

Anyway, my original intent with this issue was to track the work needed to be able to make statements about the sources of both records and conclusions. There exists today a way to use the EE certainty vocabulary to say things like:

But there isn't a vocabulary element defined for saying things like:

I think the confusion was due to the first bullet point that (used to) say "This person was not born at this date/time". That's my bad. The issue of negative conclusions is distinct from this one.

stoicflame commented 12 years ago

Hey, @EssyGreen, the comments in these issues are formatted according to Github Flavored Markdown.

I'm often referring here.

EssyGreen commented 12 years ago

Thanks for the link @stoicflame and glad to hear you are getting more resources :)

Are we any further forward in the debate?

lkessler commented 12 years ago

I see this is still open.

I see CONFIDENCE_LEVEL current is: [ Certainly | Probably | Possibly | Likely | Apparently | Perhaps | OTHER ].,

I think you should simply add: [ Certainly Not | Probably Not | Possibly Not | Likely Not | Apparently Not | Perhaps Not ]

Louis

EssyGreen commented 12 years ago

I still think it's open too and I wouldn't disagree with your proposal.

This is one instance tho' where I think "Other" is unhelpful.

I think there is also a need for "Unevaluated" (or allowance for a null value which would equate to the same thing)

lkessler commented 12 years ago

Essy,

OTHER is used in most of GEDCOM X's type selections, e.g.. Relationship types, Age Part types, Gender types, etc. So it appears to be their standard way of allowing extensibility for lists of items.

If you don't like that, you should probably open a new issue.

Louis

EssyGreen commented 12 years ago

I realise that and in most situations I believe it is appropriate (if not essential) but in this particular case I think it is superfluous since there is already a free-form text field for the Proof Statement. The coding here is just a short-hand form. If the researcher/application didn't use one of the standard short-hand codes then there must be scope for a null equivalent. No application will be able to do anything with "Other" so why have it?

lkessler commented 12 years ago

Essy,

Personally, I agree. I don't think they should have OTHER in any of the constructs.

Louis

stoicflame commented 12 years ago

Waking this thread up, I think now that we've got a more well-defined source and source reference model, we can address this issue.

Over at #202 @nilsbrummond suggested that a source reference carry an evidence indicator, like this:

<person>
  ...
  <source evidence="direct | indirect | negative" description="S1"/>
  ...
</person>

These "evidence types" are taken from EE, section 1.14. Thoughts? Is this the right way to address this?

stoicflame commented 11 years ago

Okay, in an effort to identify and wrap up the last set of issues that might imply new features or potentially backwards-incompatible changes, I'm going to close this issue with the following explanation:

With the discussion and work that was tracked at #242, #244, and #246 we believe that we have adequate support for modelling a researcher's efforts to gather evidence and resolve conflicts using the extracted conclusion, evidence reference and associated analysis documents. At #250 and #251, we recognize that there may be other "analysis" properties that could be added when the industry matures more and aligns on the terms and concepts that are needed and when those terms and concepts get some more traction in the software products that are being used.

Regarding the notion of "negative conclusions" (e.g. the person's name was not "John", etc.), we are making an explicit decision to not support them. There are two reasons that we could think of to use negative conclusions:

  1. To resolve conflicts in information found in separate sources. As stated above, we believe that we have adequate support for conflict resolution using the mechanisms mentioned, and we recognize that there may be other properties that could be added to enhance support for this at a future time as needed.
  2. In collaborative systems, to support users that want to tell the system to prevent other users from making the conclusion. We believe that such a mechanism is application-specific and outside the scope of this specification set.

Thank you for your participation.

lkessler commented 11 years ago

Ryan:

<person>
  ...
  <source evidence="direct | indirect | negative" description="S1"/>
  ...
</person>

I do like this, especially the direct specification of "negative" evidence.

Louis