Open EssyGreen opened 12 years ago
I think GEDCOMX should also support all aspects of the Genealogical Workflow as presented by Ron Tanner at RootsTech 2012: http://s3.amazonaws.com/rootstech/original/Ron%20Tanner_Report%20Card.pdf?1326143168 http://s3.amazonaws.com/rootstech/original/GeneaWorkFlow_public.pdf?1328546566
There's probably a lot in common between the Genealogy Research Process and this, but I'm sure the two nicely augment each other.
Louis
@lkessler - both links are based on the GRP so yes I agree should be supported but ...
@stoicflame - could we have some clarity on whether GEDCOMX is intending to provide a minimal or best practice model? This was touched on in #138 - the two goals are quite different I think - either a minimum which must be adhered to (tho' without a regulatory body I'm not sure how the 'must' is ever enforced so I guess it has to be 'should') or the "best in class" which applications should strive to achieve.
Also, you said in #138 that the goal was:
to define a model and serialization format for exchanging the components of the proof standard as specified by the genealogical research process [...] in a standard way.
The proof standard is not quite the same thing as the process model (tho' the two are obviously complimentary). The GPS consists of five elements:
Sorry if I'm sounding picky here but the Proof Standard in its simplest form could just be represented by tacking a ProofStatement and a Bibliography onto the Record Model, whilst the Process Model is more specific and covers the whole range of research activities not just the proof at the end.
To put it another way, should we focus on exchanging/standardising the publication of genealogical data (conclusions at the end of the process) or should we focus on exchanging/standardising the sharing/transfer of genealogical data (all data during and throughout the process).
I'm just trying to understand the scope - sorry if it's tedious.
Some use cases may help illustrate:
Which of the above can/should GEDCOMX Conclusion Model be trying to address?
An excellent set of use cases. Good work!
Many thanks :) I wondered if I was just getting too tedious!
OK here's my take on answering my own questions:
I hasten to add that this is my head speaking ... my heart longs for a way to miraculously pump my cherished data through a magic machine and get it into whatever software I fancy with all the data, links and context intact (and preferably transformed in the "better" way supported by the new software). Sadly this pipe dream just leads to disappointment when I wake up in the real world.
To summarise, I would conclude that the Conclusion Model should focus on a clear and simple data structure which can be interpreted either end of the transfer as unambiguously as possible.
(In concluding this I should go back and retract many of my posts since I have tended to focus on the 'best practice' rather than minimalistic! Yes, I'm shooting myself in the foot here!)
PS: Just to shoot myself in the other foot ... I suspect the "minimalist" model = the Record Model and hence reverses my vote for #138
@EssyGreen you've done a great job putting together these thoughts and use cases. I don't think you're being tedious at all.
The issues you bring up are really tough to answer, but in the end I think I arrive at the same place that you articulated:
I would conclude that the Conclusion Model should focus on a clear and simple data structure which can be interpreted either end of the transfer as unambiguously as possible.
Which seems to imply a "minimalist" approach for this first version. But it still needs to be flexible enough to provide for future standards that will fill in more aspects of that "magic machine" with "all the data, links and context intact".
In addition to addressing extensibility concerns, we know that the "minimal" standard needs to address more than what legacy GEDCOM does today. Our task is to identify and address what else is minimally needed and provide for it "as unambiguously as possible".
I suspect the "minimalist" model = the Record Model and hence reverses my vote for #138
Actually, I sincerely think the conclusion model is a better fit for this. The record model as it's defined today attempts to deal with some very narrowly-focused subtleties of dealing with field-based record extraction and hence has a bunch of stuff that I doesn't really fit in this "minimalist" model. Date parts (see issue #130) is a great example of that.
@stoicflame - many thanks for the positive feedback :) I have a couple of points related to your reply:
we know that the "minimal" standard needs to address more than what legacy GEDCOM does today
I'm not sure I agree with you there (tho' some examples may make me change my mind!) ... I think in some ways old GEDCOM attempted to achieve too much and hence ended up with aspects that applications wanted to treat differently but felt they couldn't because of the GEDCOM structure. A clear example of this I think is the PLACe structure ... by making it an embedded structure and including sub-elements it was difficult to convert this to/from a high level Place object without added complexity on import and data loss on export. We've solved this one in GEDCOMX (I think) by making it a record-level element but could fall into the same trap elsewhere. A similar problem happened with the little-used ROMN and FONE sub-elements which were quickly outdated by more advanced phonetic techniques and yet hung around in the sub-structures making the GEDCOM PLACe and NAME structures unnecessarily unwieldy. Conversely I would argue that over-use of the NOTE record links (e.g. alongside CALlNumbers) created an unnecessarily "stringy" structure.
In summary, I think that the flatter the structure (within reason) the more flexible it is ... long trails of sub-elements are more likely to be problematic, especially in relational data scenarios.
I sincerely think the conclusion model is a better fit
You may be right ... to be honest my .Net version of the model is a bit of a mess so it's really hard to see what's in what. I've been hoping for a pull request to get a clearer/new model? Have I missed one or is it still in limbo (or should I go back to using eclipse/java)?
EssyGreen said:
To summarise, I would conclude that the Conclusion Model should focus on a clear and simple data structure which can be interpreted either end of the transfer as unambiguously as possible.
Sounds like GEDCOM with a few tweaks. :-)
stoicflame said:
In addition to addressing extensibility concerns, we know that the "minimal" standard needs to address more than what legacy GEDCOM does today. Our task is to identify and address what else is minimally needed and provide for it "as unambiguously as possible".
That works for me too.
Louis
Sounds like GEDCOM with a few tweaks
Maybe ... @stoicflame - do you have a list of the good and the bad things about old GEDCOM so we can retain the good and get rid of the bad? If not, is it worth brainstorming?
Sounds like GEDCOM with a few tweaks.
It does kind of sound like that, huh? I guess it kind of depends on what you think legacy GEDCOM primarily was. If you think it was a definition of a model for evidence information and a way to encode it, then I agree that this project sounds a lot like GEDCOM with a few tweaks. But if you consider the syntax of a GEDCOM file as being a major part of the spec, then this project doesn't sound like "GEDCOM with a few tweaks".
In other words, I think one of the primary goals of this project is to overhaul the foundational technologies of GEnealogical Data COMmunications. This will enable the genealogical IT community to collaboratively, iteratively, and cleanly integrate the latest trends in application development.
So even though the conceptual scope of GEDCOM X 1.0 won't be a huge revolution, the remodel of the infrastructure will be a big step forward for the community.
In response to the original purpose of this thread, I think the initial scope of this project needs to be limited to the "Cite" and "Analyze" sections of the genealogy research map that @EssyGreen referenced. These are the sections that we're most familiar with sharing and exchanging via legacy GEDCOM, so the focus there has the biggest chance of success. As much as possible, the standard needs to supply well-defined integration points for the other sections of the process model that will be addressed by future efforts.
Right now, we're working now on refactoring the project so that these concepts are clearly articulated at the project site. This effort includes the proposal outlined at #138. We hope this will be a big improvement to the project and we're anxious to get these changes applied for everybody to see.
do you have a list of the good and the bad things about old GEDCOM so we can retain the good and get rid of the bad? If not, is it worth brainstorming?
I don't have a definitive list, no. We should probably pull together that list from a lot of different sources, including this issue forum, the BetterGEDCOM wik, etc. We should also proactively request community help to pull together that list. I think a brainstorm is a good idea, but I'm struggling with the best way to facilitate that. I worry that creating a new thread would get too noisy with everybody commenting on everybody else's comments. And that would stifle those who have something to say but don't want to be subject to community scrutiny.
What if I created a web form that people could fill out and submit? I'd broadcast its availability, gather all the comments, and post them somewhere so everybody could see the results without knowing who submitted them. There are some people that I consider legacy GEDCOM experts that I'd be especially anxious to see contribute....
Thoughts?
What if I created a web form that people could fill out and submit? I'd broadcast its availability, gather all the comments, and post them somewhere so everybody could see the results without knowing who submitted them
Sounds like an excellent plan!
I think the initial scope of this project needs to be limited to the "Cite" and "Analyze" sections of the genealogy research map
Initial scope maybe but I think the whole process needs to be covered albeit in a simple form. For example, a simplistic inclusion of "Goals" could be a "ToDo" (=Research Goal) object (top level entity) with:
Plus an (optional) "ToDo" list of links included in each Person (representing the subject of the goal)
A listing of all ToDos in CreationDate order represents the ResearchLog.
This seems pretty simple to me but maybe I'm falling back into the "best practice" rather than the "simplistic" approach again.
Re the other end of the process (Proof/Resolve/Conclude) ... In my experience there has been a growing awareness of the need for evidence-based genealogy rather than just "citing" sources and I think some form of inclusion would add credibility to the model and get a greater chance of GEDCOMX's acceptance. But it's a complex area so needs to be shredded down to a simple form.
Sounds like GEDCOM with a few tweaks.
It does kind of sound like that, huh? I guess it kind of depends on what you think legacy GEDCOM primarily was. If you think it was a definition of a model for evidence information and a way to encode it, then I agree that this project sounds a lot like GEDCOM with a few tweaks.
Current GEDCOM is a way to store and transfer genealogical conclusions. It also has inclusion of sources and source detail data, but only when used as evidence from the point of view of the conclusions.
But if you consider the syntax of a GEDCOM file as being a major part of the spec, then this project doesn't sound like "GEDCOM with a few tweaks".
No, I don't see the syntax being a major part of the spec. We could take the existing GEDCOM and transfer it mechanically into XML, JSON, or whatever. We could also take the GEDCOM X spec and translate it into the GEDCOM syntax.
The content is all important. The syntax is not. Using a standard syntax potentially gives programmers and users more tools to use. Simple translators would be easy to write to convert GEDCOM X in one syntax to another.
But simple translators to convert to and from GEDCOM 5.5.1 will be essential. If the conclusion data model of GEDCOM X is only "tweaked" from GEDCOM 5.5.1, then the transfer of the data that GEDCOM 5.5.1 can accept will be possible. However, if the conclusion data model of GEDCOM X is rebuilt, then the transfer will not be possible and the genealogical community will have a problem.
Louis
However, if the conclusion data model of GEDCOM X is rebuilt, then the transfer will not be possible and the genealogical community will have a problem.
That's rather overstating the case. If the conclusion data model is substantially different from GEDCOM's, the translation may be more complicated and lossy, particularly going from GedcomX to GEDCOM. It won't be impossible.
The genealogy software community (not the genealogy community, most of which doesn't actually care about the details but is utterly frustrated with the present lack of interoperability between mainstream programs) already has this problem: Few mainstream programs have internal data models that map well to GEDCOM, and their inadequate translation efforts are one of the main sources of that user frustration. The greater problem for GedcomX isn't what should or shouldn't be in its data model, it's that none of the mainstream program vendors are participating.
@jralls - excellent points!
The greater problem for GedcomX isn't what should or shouldn't be in its data model, it's that none of the mainstream program vendors are participating.
I have to say this is something that's bothered me ... hands up anyone here from Family Tree Maker, RootsMagic, Master Genealogist, ReUnion, FamilyHistorian etc etc? Are you lurking or absent?
@lkessler
But if you consider the syntax of a GEDCOM file as being a major part of the spec, then this project doesn't sound like "GEDCOM with a few tweaks".
No, I don't see the syntax being a major part of the spec. [...] The content is all important. The syntax is not.
I totally agree with you on this point ...
simple translators to convert to and from GEDCOM 5.5.1 will be essential. If the conclusion data model of GEDCOM X is only "tweaked" from GEDCOM 5.5.1, then the transfer of the data that GEDCOM 5.5.1 can accept will be possible. However, if the conclusion data model of GEDCOM X is rebuilt, then the transfer will not be possible and the genealogical community will have a problem.
... however, here I disagree ... to limit the scope of GEDCOMX to GEDCOM 5 with a few tweaks would be worthless. The problem with GEDCOM has never been the syntax (it's about as simple as you can get), it's the content (as you say above). Yes, we will need to provide a migration path from 5 to X but this should not be the goal of GEDCOMX. The goal should be to improve the data content and structure to be more in-line with the needs of the user community (which in turn should be more in-line with the needs of the software industry). Ergo map the process model but do it in a simple way that can be implemented in different ways by different software vendors.
@EssyGreen "I have to say this is something that's bothered me ... hands up anyone here from Family Tree Maker, RootsMagic, Master Genealogist, ReUnion, FamilyHistorian etc etc? Are you lurking or absent?"
Some are lurking, some are absent, some just flat don't care.
@stoicflame - I noticed you put up that web-link for peeps to comment on GEDCOM strong/weak points .... any feedback yet?
any feedback yet?
Yes, thanks for reminding me. I need to get that posted.
@EssyGreen I finally got around to compiling the responses we got from the little poll we took:
Brilliant! So now we have something to judge GEDCOM X against ... has it resolved these problems/addressed the deficiencies? Which areas do we need to tweak/adjust?
has it resolved these problems/addressed the deficiencies?
A lot of them, yes.
Which areas do we need to tweak/adjust?
Maybe that's the next step here? How do you think we should publish that information? Maybe add to that page a table with notes on how (or whether) GEDCOM X intends to address those issues?
@stoicflame 'GEDCOM strong/weak points'
So where is the strong point list??
@alex-anders - good point!
@stoicflame
Maybe that's the next step here? How do you think we should publish that information? Maybe add to that page a table with notes on how (or whether) GEDCOM X intends to address those issues?
Yes - definitely the next step and agree with your suggestion
So where is the strong point list??
Good point. We didn't gather those. My apologies. What should we do to remedy that?
Add 'em on to the same "Deficiencies" page?
Add 'em on to the same "Deficiencies" page?
That implies we've got 'em.
I'll have to set up another request for feedback....
Ah! oh! Ooops!
Here's my take on how GEDCOM X rates against the GEDCOM 5 deficiencies:
The GEDCOM Deficiencies are a little unfair to GEDCOM. Here's my comments:
• Can't separate conclusions from evidence - I believe GEDCOM X has done this with the separation of the Conclusion mdoel from the Record model. And I like that!
• No support for independent place entities - Yes, a deficiency of GEDCOM. But I don't see how that is done in GEDCOM X which does not have place as a top-level record.
• No support for multi-role event entities. - Do we really want to complicate our lives with this?
• Lack of support for many of the most common data items. - Hard to define what they are, but then its just as easy to include them in GEDCOM as it is to include them in GEDCOM X.
• No support for negative evidence. - Very simple to fix in GEDCOM or in GEDCOM X. In GEDCOM X, the CONFIDENCE_LEVEL current is: [ Certainly | Probably | Possibly | Likely | Apparently | Perhaps | OTHER ]., Just add: [ Certainly Not | Probably Not | Possibly Not | Likely Not | Apparently Not | Perhaps Not ]
• Lack of support for multimedia; for formally-structured citations. There's nothing in GEDCOM preventing these from being added.
• No formal policy for managing, processing, and specifying vendor extensions. - Allowing extensions is VERY dangerous. True, there's no formal policy for managing the extensions. But is that supposed to be part of the standard? If so, that should be done ASAP for GEDCOM and we should set an authority to manage it. Maybe that authority could also get vendors to implement their GEDCOM correctly. I'm curious if this is going to be a part of GEDCOM X and if so, will FamilySearch staff be the police?
• Requires sequential processing; the file must be processed entry-by-entry, one at a time. - I don't know why Essy called this DONE. XML is no different than GEDCOM. It is a flat file. It is only if you add indexing of records that you get sequential processing. This requires a prior pass of the data and can be done just as easily with XML as with GEDCOM. Is it even valid to call this a deficiency of GEDCOM? GEDCOM was actually designed with INDI and FAM records so that developers could (in the old days with 32 KB memory block limits) randomly access the data.
• Requires inefficient processing; the entire file must be processed altogether and you can't process the file in pieces. - Again, like the last point, I don't see this as a deficiency of GEDCOM. GEDCOM can be processed in pieces Being simple text, it can be scanned quickly to index the records, and then only the records needed need be processed. That's no different than an XML file that is in pieces but zipped together as GEDCOM X is.
• Lack of reference examples and recipe books - I agree the examples in GEDCOM are poor, and some don't even follow the standard. :-(
• Lack of shared processing code. - Not a deficiency of GEDCOM. There are many GEDCOM libaries. How about Dallan Quass' library, which is being used by the GEDCOM X conversion tool.
• Lack of validation and conformance tools. - Not a deficiency of GEDCOM. There are many GEDCOM validators out there.
• No support for formal data types and templates for processing text (e.g. names) - GEDCOM sort of "defined" the data types back then. 15 years later, there's a whole new set of "authorities" out there. This is a trade-off between simplicity and standardization. Using formal data types for everything is a drastic change and will cause existing genealogy software vendors much pain.
• Too narrow modeling constraints (e.g. same-gender couple relationships). - There are a hundred of these sorts of things that can be listed as problems with GEDCOM. But almost all are almost trivial to fix and certainly don't require a complete rewrite to do so. For the same-gender example, simply take off the requirement in the GEDCOM standard that they be the same sex and allow a SPOU tag, rather than the HUSB and WIFE tag. Use the SEX on the individual to determine if it is a husband or wife or same sex couple. However, I like the idea of getting rid of FAMily in GEDCOM and replacing it with Relationships in GEDCOM X.
• Not enough emphasis on non-family relationships and associations - Need to add a GROUP record. Then the Relationship can be between an individual and a group. GEDCOM X doesn't have a GROUP yet, but needs one.
• Fragmented vendor adoption leading to poor interoperability.- What?? Vendor adoption was nearly 100%. That isn't the problem. Poor interoperability is because vendors didn't implement it correctly and that is what caused poor interoperability.
• No standard way to indicate there were no children in a marriage.- Not a deficiency of GEDCOM. GEDCOM has the NCHI
• Over-specification, overuse of rarely-used fields.- Actually, I think GEDCOM was specified very well. GEDCOM X is NOT specified well yet and is very difficult to grasp. If anything, underuse of rarely-used fields in the GEDCOM problem, not overuse.
• Lack of support for referential integrity (inter-entity links are disjointed and ambiguous). - That's not been a problem in GEDCOM. Pretty well all programs correctly maintain and export the links. It's only manually edited GEDCOMs that seem to have problems.
• Poor balance between inline vs. referenced data? If repositories are to use the GEDCOM X Record model, then we will need some way to reference that.
If you take the list of GEDCOM deficiencies, and strike off the few I think are wrong, and discount any that would be easy to fix, there's not much left.
Louis
The GEDCOM Deficiencies are a little unfair to GEDCOM
Maybe but it's real feedback from real people whose opinion Ryan values (or at least that's what it was intended for). Our intention here was to ensure GEDCOM X was going to correct these rather than to re-evaluate GEDCOM 5
No support for independent place entities - Yes, a deficiency of GEDCOM. But I don't see how that is done in GEDCOM X which does not have place as a top-level record.
Ack you are right! I thought we'd resolved that one ages ago
Not enough emphasis on non-family relationships and associations - Need to add a GROUP record. Then the Relationship can be between an individual and a group. GEDCOM X doesn't have a GROUP yet, but needs one.
I disagree with this approach ... I don't see how this helps. All we need is for the Relationship entity to allow any/"Other" types of relationship between two people.
Poor interoperability is because vendors didn't implement it correctly and that is what caused poor interoperability
OK, so why do you think that was?
If anything, underuse of rarely-used fields in the GEDCOM problem, not overuse.
I disagree. If something is rare then it doesn't need to be in the base standard but can be tweaked later as demand arises (as you pointed out above) or (in my opinion) omitted and left for vendors to specify
Lack of support for referential integrity (inter-entity links are disjointed and ambiguous). - That's not been a problem in GEDCOM. Pretty well all programs correctly maintain and export the links. It's only manually edited GEDCOMs that seem to have problems.
FAMS, ASSOs and ALIAs pointers were all problematic in GEDCOM 5 ... how do you match up Person A's ASSO to Person B with Person B's ASSO to Person A ... I relationship is two-way and I think GEDCOM X has solved that with the Relationship entity
If you take the list of GEDCOM deficiencies, and strike off the few I think are wrong, and discount any that would be easy to fix, there's not much left.
Yes but the devil is in the detail :) It is helpful to know which ones we still need to focus on and which ones need tweaking etc
Essy,
For why I think a Group Record is important, see: http://www.beholdgenealogy.com/blog/?p=1097
Vendors didn't implement GEDCOM correctly for many reasons. Anything and everything, from interpreting the standard incorrectly, to being lazy, to not caring, to not knowing, to simply making mistakes. I don't know. Ask them. It's nearly impossible to implement anything perfectly. Simpler standards have better chances of getting implemented correctly. GEDCOM is not a simple standard. GEDCOM X is even more complex. It will be even more difficult than GEDCOM to do - even if codebases are provided, since translation of the GEDCOM X data to the program's internal data structure must still happen.
By underuse of rarely-used fields, I mean't some of the GEDCOM structures and tags that really would have been useful if developers would have known about them. Such as the ALIAs tag (if used the correct way) and the ASSOciation tag. The GEDCOM Source_Record is quite powerful with EVEN, DATE, PLAC AGNC, AUTH, TITL, PUBL, TEXT, REFN and RIN tags, but very few programs use them to their potential and instead made their own custom citations tags.
A two-way relationship needs to be defined two ways. Parent-Child implies direction, and the type of link and the order of Person1 and Person2 provides that in GEDCOM X. But what do you do in GEDCOM X for other types of relationships, e.g. Barney attended the birth of Pebbles.That is the relationship one way. The other way it is: Pebbles birth event had Barney attending. How do you write the event so that it is unambiguous in a relationship with Person1 and Person2? It is more clear to write the two one-way relationships in this case.
Well what is it really that GEDCOM X is trying to do that GEDCOM doesn't do? Does everything have to change as radically as GEDCOM X is changing it? It certainly doesn't seem so from that relatively small list of deficiencies, of which you noted that GEDCOM X still had work to do on most of them.
Louis
@lkessler I believe that your "GROUP" requirement is already satisfied by the allowance of multiple roles in an event.
Simpler standards have better chances of getting implemented correctly.
I agree. I also agree that GEDCOM X is too complex atm.
some of the GEDCOM structures and tags that really would have been useful if developers would have known about them
Not sure why developer's wouldn't have known about them - they were in the spec. In my opinion they just didn't provide a useful structure (for a variety of reasons e.g. ambiguity, lack of referential integrity, not wanted/used by user base etc)
The GEDCOM Source_Record is quite powerful [...], but very few programs use them to their potential
A good illustration ... I have frequently used the details you described but there is a major flaw in that there was no way to describe the Person or Relationships in that context so the usefulness was extremely limited. Sadly GEDCOM X seems to be disposing of this element rather than improving it.
A two-way relationship needs to be defined two ways
Indeed but this must be done in a way that couples them together ... If Person A has multiple ASSOs with Person B (say Uncle/Nephew and also Step-Father/Step-Son) then it is pretty impossible to link them together in the current GEDCOM - the app can't tell whether Uncle goes with Nephew or Step-Son and vice versa. GEDCOM X has fixed that with the Relationship which binds them together in a particular context.
what do you do in GEDCOM X for other types of relationships, e.g. Barney attended the birth of Pebbles
Event: Birth Role 1: Child - Pebbles Role 2: Witness - Barney
How do you write the event so that it is unambiguous in a relationship with Person1 and Person2
I agree this is too specific atm - only parent/child and couple seem to be supported. I think it should be similar to the ASSO but in one entity:
Relationship: Person 1:Fred Role: Uncle Person 2: Joey Role: Nephew Sources, Notes etc etc
Well what is it really that GEDCOM X is trying to do that GEDCOM doesn't do? Does everything have to change as radically as GEDCOM X is changing it?
Valid questions which only Ryan can answer :)
agree this is too specific atm - only parent/child and couple seem to be supported. I think it should be similar to the ASSO but in one entity:
- Relationship:
- Person 1:Fred
- Role: Uncle
- Person 2: Joey
- Role: Nephew
+1, but I'd abstract the Person-Role pair into a class as is done with EventRole. Perhaps is should be three parts: Person, Role, and Detail, with enumerated Roles Conjugal Partner, Bio Parent, Bio Child, Adopt Parent, Adopt Child, and Other. That captures the relationships a program needs to construct family, ancestry, and descendancy. Detail is a free string so that any other relationships the researcher wants to capture can be specified. Sources, Notes etc etc
Well what is it really that GEDCOM X is trying to do that GEDCOM doesn't do? Does everything have to change as radically as GEDCOM X is changing it?
Valid questions which only Ryan can answer :)
Which he did in #156.
Requires sequential processing; the file must be processed entry-by-entry, one at a time. - I don't know why Essy called this DONE. XML is no different than GEDCOM. It is a flat file.
It is not a single XML document, it is a ZIP file containing a bunch of XML documents and other files. See https://github.com/FamilySearch/gedcomx/blob/master/specifications/file-format-specification.md
I'd abstract the Person-Role pair into a class as is done with EventRole
I wouldn't disagree I was just using simple syntax to illustrate the concept
Essy said: "@lkessler I believe that your "GROUP" requirement is already satisfied by the allowance of multiple roles in an event."
But I want groups that can have events of their own, just as I want places that can have events of their own. Because of that, I feel groups and places need to be a top level record.
p.s. I like GEDCOM X merging "events" into "facts".
Essy said: "Not sure why developer's wouldn't have known about them - they were in the spec".
I'm glad you think we developers have perfect interpretation and total recall. :-) I'm sure you and I both know about every little detail that is in GEDCOM X already ... NOT!
John: Thanks for pointing out #156 - Clarify What GedcomX Is. Of course, stoicflame refers to the issue we are in, #141, to be the one that will articulate How GEDCOM X will do this. And this one now refers back to that one. And that one refers to this one ...
John: Yes, I know they've physically packages it into a ZIP contains thousands of files. So change my statement to: "ZIP is no different than GEDCOM. It is still a single file that must be read and the contents extracted for processing." The point was that it is not a database with indexed retrieval. I don't believe you can read a single file from a zip without unzipping it first.
Louis
I want groups that can have events of their own
What are the benefits of doing it this way? You lose flexibility because there would be a very small number of events with exactly the same people in it. For example, If you define a group say Fred, Freda and Joey Bloggs for Joey's birth; the same "Group" might be appropriate for a baptism but I can't think of many other events you could put under the same group. Similarly, if you define a "Group" for Fred and Freda Bloggs as a married couple then you will have a marriage and possibly a Residence or two but chances are one will die before the other and so the last Residence events would have to be split to cater for the date differences. I can't see that you will ever have more than a couple of events per group so it seems to be rather redundant.
Not sure why developer's wouldn't have known about them - they were in the spec
I'm glad you think we developers have perfect interpretation and total recall
I don't understand your sarcasm .. the GEDCOM 5 spec is publicly and widely available. Developers don't need total recall - they just need to be able to read! I think the reason that some things didn't get adopted was more to do with the fact that they didn't provide a clear benefit (e.g. the SOUR details would have been really useful if there was some way to specify the people and not just the types of event - without this the data is just an additional data entry/management burden)
Louis,
And that one refers to this one ...
You're a programmer. You're supposed to like recursion! ;-)
I don't believe you can read a single file from a zip without unzipping it first.
You believe incorrectly. From the Wikipedia zip (file format) article:
Compressing files separately, as is done in zip files, allows for random access: individual files can be retrieved without reading through other data.
Just picking up on the zip thread for a moment.
I have no objection to zip as a way to package GEDCOMX data. My objection came from keeping each top level object as a separate file. But not because each is a separate file per se, but because the GEDCOMX standard now uses XML with many namespaces and long URIs, so each separate file must therefore contain a truly incredible amount of redundant information that is included anew in every file.
Doesn't it seem to be the pinnacle of irony to use compression on a set of files with so much redundant information? Doesn't it seem to be about the most anti-common sense thing you've heard of in the past couple weeks?
As John pointed out, the zip file contains a directory of its contents, so each internal file can be read separately without unzipping the whole file. So assuming that each top level will be a separate file there are some conclusions that need to be made.
Note that when reading individual files out of a zip file you would have to have a reason to be doing that, which means that an id or key must first be supplied to identify the file. Where would that id or key from? This points out how important it will be to add an index file to the zip file to be extracted first and then used as an index for everything else in the zip. For example, the unique ids, the persons' names, the id's of all other internal files that each internal file refers to and why. Just imagine the problem of finding someone with a given name in the zip file, and they extracting their pedigree from the zip without first reading the entire zip file into an auxiliary database. The only practical way to deal with these zip files is to think of them as mini-databases and to supply an index file to that database in the zip. Which is exactly what Java does with the meta file that it adds to zip files to turn them into jar files.
So if GEDCOMX sticks with the idea of zip files with individual files for each top level object, then the standard must also include a definition of the aforementioned index file.
Which is exactly what Java does with the meta file that it adds to zip files to turn them into jar files.
And exactly why the GedcomX spec does too.
now uses XML with many namespaces and long URIs, so each separate file must therefore contain a truly incredible amount of redundant information that is included anew in every file.
Nope. The ZIP can contain a DTD which provides all of the namespaces and their URIs.
And exactly why the GedcomX spec does too.
I'll believe you, but I can't get this from the specifications. I hope you realize I was not talking about a simple directory listing file, but a rich index file. The GEDCOMX specifications are singularly obtuse and almost unintelligible. Maybe the answer is in the header set concept, but if so, the authors have not explained it.
now uses XML with many namespaces and long URIs, so each separate file must therefore contain a truly incredible amount of redundant information that is included anew in every file.
Nope. The ZIP can contain a DTD which provides all of the namespaces and their URIs.
I guess I'll believe you again, but this was not the results that Tamura reported. He expanded GEDCOMX files and inspected the contents. There were no DTD's and every "file" contained redundant definitions of everything. On converting GEDCOM files to GEDCOMX zip files, using the GEDCOMX provided tool, he experienced over a 35 times increase in file size. From that I can't come to any other conclusion than the current GEDCOMX file format is a disaster. Maybe a DTD will make it workable. And I have already expressed my strong opinion that the archival format should be all simple tags with no namespaces and no long URIs.
@ttwetmore - I share your concerns here and would also vote for simple tags.
The GEDCOMX specifications are singularly obtuse and almost unintelligible.
I don't find them to be either. Woefully incomplete, but neither obtuse nor unintelligible.
That said, if you can't understand the specs, then you're arguing about some stra
guess I'll believe you again, but this was not the results that Tamura reported. He expanded GEDCOMX files and inspected the contents.
In any case, I said can contain. That's not required in the GedcomX spec (though I think it would be a good idea), but it's not prohibited, either, and the XML recommendations allow it. Tamura Jones seems to enjoy throwing rocks without actually doing anything useful. From where did he get these GedcomX files, considering that the only code is Ryan's backwards JAXB mess that he uses to produce the documentation and which doesn't actually do anything?
Moreover, who cares these days about a couple of K of URIs? Text is tiny. Are you writing DeadEnds for the Arduino?
And I have already expressed my strong opinion that the archival format should be all simple tags with no namespaces and no long URIs.
Yup. The RDF discussion is in #165. No need to bring it up here. Anyway, unless FamilySearch can be persuaded to separate the static data-exchange solution from the web services solution, RDF is necessary, so unless we can get that split to happen we should work on making the RDF aspect as painless as possible.
The GEDCOMX specifications are singularly obtuse and almost unintelligible.
I don't find them to be either. Woefully incomplete, but neither obtuse nor unintelligible.
You're clearly a lot smarter than me; I can graciously accept that.
Moreover, who cares these days about a couple of K of URIs? Text is tiny. Are you writing DeadEnds for the Arduino?
I do. A lot. And it isn't a couple K. When you are using between one and two orders of magnitude too much resource to encode something that is very simple, you are being profligate to the point of stupidity no matter how cheap the resource. Our world still has such things as appropriateness and elegance and rightness in it.
Tamura Jones seems to enjoy throwing rocks without actually doing anything useful. From where did he get these GedcomX files
He got the GEDCOMX files by building them himself from the tool that was announced by GEDCOMX, a tool that converts GEDCOM files to GEDCOMX files, a tool that is available on the GEDCOMX github somewhere. He used it on a number of his test GEDCOM files and published the results. You can check his blog for details.
Anyway, unless FamilySearch can be persuaded to separate the static data-exchange solution from the web services solution, RDF is necessary, so unless we can get that split to happen we should work on making the RDF aspect as painless as possible.
I have suggested an excellent solution to the RDF conundrum that allows the archive format to contain no namespaces and no RDF URI's, but with an easy capability to generate the full form for those who feel they need it. But like you said this ain't the thread for it.
As a sanity check it is worth checking that GEDCOMX fulfills the needs for the research process. In an attempt to prevent too much debate on the definition of the research process I am citing the model certified by BCG & ESM (see this link http://www.olliatauburn.org/download/genealogy_research_map.pdf)
How well does GEDCOMX support the data described?