History-Research-Environment / HRE--History-Research-Environment

Main repo for HRE code
https://historyresearchenvironment.org/
GNU Affero General Public License v3.0
32 stars 6 forks source link

Article for discussion "Why All Genealogy Apps Should Support GEDCOM 5.5.1" #3

Closed daleathan closed 6 years ago

daleathan commented 7 years ago

Import GEDCOM v5.5 standard format files

Export data from HRE that conforms to the GEDCOM v5.5 standard

Wouldn't GEDCOM v5.5.1 be better?

http://genealogytools.com/why-all-genealogy-apps-should-support-gedcom-5-5-1/

daleathan commented 6 years ago

Well that's really sad guys!

You ignore a valid point and just close it without comment.

If this is any indication on how open this project is going to be then I don't hold much hope for it !

daleathan commented 6 years ago

@HREferg @warrenvail @MichaelNMaggs @MichaelErichsen @RobinLamacraft Not even an acknowledgement!

HREferg commented 6 years ago

Dale, the requirement to be able to import GEDCOM is a documented component of the HRE design. HRE’s ability to export GEDCOM is (as with TMG) problematical, given the inability of GEDCOM to handle the formats of data that HRE is capable of holding.

There seemed little need to keep this Issue open in GitHub, hence I closed it.

Apologies if you felt your point needed any lengthy discussion.

Don Ferguson

RobinLamacraft commented 6 years ago

Dale,

The GEDCOM data model is so constrained that many everyday meaningful data occurrences can not be represented in a consistent fashion. Hence the number of genealogical applications which have tried to bend the GEDCOM model to satisfy their users by introducing propriety customizations.

Please note that HRE design is not explicitly focused on Persons. It covers many other history topics. We have said that we would likely export to GEDCOM via a plugin. BUT there will be so much HRE data that can't be exported because GEDCOM does not have meaningful methods or has internal constraints which preclude the storing of that data. This is likely to create dissatisfaction with the inadequate GEDCOM data model definitions.

To us in the HRE team, our most important focus is to store data in HRE in a way where it is not restricted by old-style USA-centric rules that were set up for an entirely different purpose and then adopted by genealogists without recognizing the constraints implied by that decision.

Robin

On 16-Jun-18 12:28 PM, HREferg wrote:

Dale, the requirement to be able to import GEDCOM is a documented component of the HRE design. HRE’s ability to export GEDCOM is (as with TMG) problematical, given the inability of GEDCOM to handle the formats of data that HRE is capable of holding.

There seemed little need to keep this Issue open in GitHub, hence I closed it.

Apologies if you felt your point needed any lengthy discussion.

Don Ferguson

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/History-Research-Environment/HRE--History-Research-Environment/issues/3#issuecomment-397781746, or mute the thread https://github.com/notifications/unsubscribe-auth/AVeLtHsf6w-E1w6Ny6jM8MadzePxG7Kiks5t9HRhgaJpZM4OIv9M.

-- Robin Lamacraft, Adelaide, Australia


This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus

MichaelErichsen commented 6 years ago

Dale,

As said by others, HRE is going to be more than genealogy, so the internal structure is much more generic.

Some technical thoughts about export and import, however.

The first implementation will be genealogy, and the first import must be from one or more versions of TMG. This will be done by reading the TMG Foxpro database directly.

The next mght be GEDCOM, but I assume we would have to know how many non-TMG-users would want to migrate to HRE, and how many users would like to import data from other researchers in this format. The challenge is not technical, but how to map GEDCOM entities into HRE. And prioritizing resources.

For export I am not sure about the requirements. Personally I use Second Site for my web site and might change to the GEDCOM version, unless we found a way to bridge directly from HRE to Second Site. And we have the need to export data for use by other researchers using other programs.

In cases like this you always have a difficult choice between the highest level (with most features) and the lowest level (which can be used by more other tools). Or getting time and resources to do both.

Br Michael Erichsen

richard-damon commented 6 years ago

My personal opinion is that if HRE doesn't support, to at least some moderate level, GEDCOM (5.5.1) import and export, most people, including much of your target TMG audience, will avoid it.

Yes, GEDCOM in its base standard definition is very constrained, but that says import of it should be fairly simple and easy to define. There are a number of 'standard' extensions adopted by the community which would be good to implement too.

As for export, if you don't support it to some reasonable degree, people will be very reluctant to adopt HRE for fear of being locked into an incompatible technology. Also, there will be a major choice of either support GEDCOM output such that John's GEDSITE program can create a website, or HRE needs to very early create a reasonable web site generator. People will not adopt HRE if there is no path available for publishing their work. John has indicated that he is unlikely to make a direct HRE import until it shows it is well used, and it won't get to be well used if it can't generate the web site.

Also, if you are willing to use GEDCOM extensions, as defined by the GEDCOM standard, and as other programs have done, it should be possible to export 100% of your data, admittedly, unless other programs understand your extensions the data won't get there (for now), but if you don't output it somehow it CAN'T get into other programs unless the take on the bigger job of direct import. I think one of TMG's biggest mistakes was the attitude that it wouldn't even attempt to output data that wasn't 100% defined by the GEDCOM standard. Yes, you may want a pure standard mode which only outputs a 'proper' GEDCOM (which omits a lot of data), but you also NEED a mode that outputs at least most of the genealogical significant data using extensions (the common industry standard ones when possible), and ideally have a mode which outputs virtually all the data using your own private extensions when needed.

RobinLamacraft commented 6 years ago

Hi Richard,

I agree that we will need to export to GEDCOM (5.5.1) to a reasonable standard.

But there are so many places where standard GEDCOM fails to allow data to be stored in a more detailed way or it constrains the type of data that can linked to some structures or not. The over-riding definition of family creates a barrier to exporting some data that is relevant to providing context and evidence.

When we get to the point of examining a GEDCOM (5.5.1) export feature we will decide (1) what can be exported easily without loss of meaning, (2) what can be exported by the creation of custom tags and (3) what data can't be exported because that data goes beyond the concepts of GEDCOM.

Robin

On 16-Jun-18 09:33 PM, Richard Damon wrote:

My personal opinion is that if HRE doesn't support, to at least some moderate level, GEDCOM (5.5.1) import and export, most people, including much of your target TMG audience, will avoid it.

Yes, GEDCOM in its base standard definition is very constrained, but that says import of it should be fairly simple and easy to define. There are a number of 'standard' extensions adopted by the community which would be good to implement too.

As for export, if you don't support it to some reasonable degree, people will be very reluctant to adopt HRE for fear of being locked into an incompatible technology. Also, there will be a major choice of either support GEDCOM output such that John's GEDSITE program can create a website, or HRE needs to very early create a reasonable web site generator. People will not adopt HRE if there is no path available for publishing their work. John has indicated that he is unlikely to make a direct HRE import until it shows it is well used, and it won't get to be well used if it can't generate the web site.

Also, if you are willing to use GEDCOM extensions, as defined by the GEDCOM standard, and as other programs have done, it should be possible to export 100% of your data, admittedly, unless other programs understand your extensions the data won't get there (for now), but if you don't output it somehow it CAN'T get into other programs unless the take on the bigger job of direct import. I think one of TMG's biggest mistakes was the attitude that it wouldn't even attempt to output data that wasn't 100% defined by the GEDCOM standard. Yes, you may want a pure standard mode which only outputs a 'proper' GEDCOM (which omits a lot of data), but you also NEED a mode that outputs at least most of the genealogical significant data using extensions (the common industry standard ones when possible), and ideally have a mode which outputs virtually all the data using your own private extensions when needed.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/History-Research-Environment/HRE--History-Research-Environment/issues/3#issuecomment-397807787, or mute the thread https://github.com/notifications/unsubscribe-auth/AVeLtNFJVuI-LADkDIeZfBhPxisibxP9ks5t9PP6gaJpZM4OIv9M.

-- Robin Lamacraft, Adelaide, Australia


This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus

daleathan commented 6 years ago

Import GEDCOM v5.5 standard format files

Export data from HRE that conforms to the GEDCOM v5.5 standard

Wouldn't GEDCOM v5.5.1 be better?

http://genealogytools.com/why-all-genealogy-apps-should-support-gedcom-5-5-1/

@RobinLamacraft When we get to the point of examining a GEDCOM (5.5.1) export feature we will decide

@RobinLamacraft Thanks for being the closest to actually answering my question. I forget which of your PDF spec documents I copied the text above from, but essentially I was attempting to point out that HRE should output ged 5.5.1 when it does 👍

richard-damon commented 6 years ago

I will ask you, what data really can not be exported as a GEDCOM if you allow yourself to define a basic set of extension. Maybe the meta data about your configuration would have problems, but not the actual data.

I am not saying that you can export it so an arbitrary program, not knowing your extensions, would make sense, but that a program knowing your extensions could reconstruct the data.

Basic strategy, make every Focus Item a "person" in the GEDCOM file, and if they aren't really a person give them an attribute tag, like _TYPE with a value that says what sort of thing they actually are (perhaps some Focus Items that can have all their attributes described in GEDCOM don't need to be made persons, like a source that only has citations to it).

Any Event like thing connected to these 'People' can become the corresponding GEDCOM event, with a fallback of EVNT for those that don't map to a standard GEDCOM activity/property. For extra participants, use the system like RootsWeb etc which I think is a _WITN property referencing each other participant to the event with a _ROLE modifier to define the type of connection.

For non-biological interrelationships between people, and a _LINK property connecting to the other 'person' with a sub property to name the sort of relationship.

For non-standard information attached to an item, (like a Sticky) and a _PROP subfield with a _TYPE for the type of information and whatever else is needed to define it.

Basically, all the 'data' in the file can be broken down into Focus Items (People), Properties, Relationships between Focus Items, and Multi-ways which become EVNT with _WITN links.

This gives you the basic tools to describe any data, and in fact you could probably encode the meta-data with a similar system and a few pre-defined meta types.

I will admit that this totally abuses the concept of a GEDCOM 'standard', but it provides the needed path, and provides the hooks needed for some other program to get the data without needing to dig into the database file themselves.

RobinLamacraft commented 6 years ago

Hi Richard,

Thanks for some hints about how to work around the GEDCOM "standard".

When we get that time, I will seek advise from you.

It is the data object with properties and behaviors like land sub-division and amalgamation, membership of a group (army unit or crew of a ship) that is focus of the research where on the surface the GEDCOM model of Person and Family would have to be distorted to allow such exports.

Then comes the question whether distorting GEDCOM is a worthwhile approach.

It may be that only exporting the content that is easy to do would be sufficient for most genealogists.

In any case we always intend to export HRE data in a more general format like XML and/or JSON where the file provides the structural definition.

Robin

On 17-Jun-18 11:29 AM, Richard Damon wrote:

I will ask you, what data really can not be exported as a GEDCOM if you allow yourself to define a basic set of extension. Maybe the meta data about your configuration would have problems, but not the actual data.

I am not saying that you can export it so an arbitrary program, not knowing your extensions, would make sense, but that a program knowing your extensions could reconstruct the data.

Basic strategy, make every Focus Item a "person" in the GEDCOM file, and if they aren't really a person give them an attribute tag, like _TYPE with a value that says what sort of thing they actually are (perhaps some Focus Items that can have all their attributes described in GEDCOM don't need to be made persons, like a source that only has citations to it).

Any Event like thing connected to these 'People' can become the corresponding GEDCOM event, with a fallback of EVNT for those that don't map to a standard GEDCOM activity/property. For extra participants, use the system like RootsWeb etc which I think is a _WITN property referencing each other participant to the event with a _ROLE modifier to define the type of connection.

For non-biological interrelationships between people, and a _LINK property connecting to the other 'person' with a sub property to name the sort of relationship.

For non-standard information attached to an item, (like a Sticky) and a _PROP subfield with a _TYPE for the type of information and whatever else is needed to define it.

Basically, all the 'data' in the file can be broken down into Focus Items (People), Properties, Relationships between Focus Items, and Multi-ways which become EVNT with _WITN links.

This gives you the basic tools to describe any data, and in fact you could probably encode the meta-data with a similar system and a few pre-defined meta types.

I will admit that this totally abuses the concept of a GEDCOM 'standard', but it provides the needed path, and provides the hooks needed for some other program to get the data without needing to dig into the database file themselves.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/History-Research-Environment/HRE--History-Research-Environment/issues/3#issuecomment-397849371, or mute the thread https://github.com/notifications/unsubscribe-auth/AVeLtLnvf2SN4ixI0g4vY0ErnwrSF7U1ks5t9bf2gaJpZM4OIv9M.

-- Robin Lamacraft, Adelaide, Australia


This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus

richard-damon commented 6 years ago

Yes, there is a violence to the concept of GEDCOM to call a piece of land a 'Person' and some relationship among them to be consider Parent-Child. I would say that TMG users started this with the concept of Pseudo-Persons, and would actually be EXPECTED to happen in HRE, especially if that was done in the data that was imported into HRE. A lot of the misc. info about these should actually fit well with the EVNT tag, and once you get past the idea that a 'Person' might be actually all sorts of things, this is fairly natural. Somethings that provide addition interrelationships between entities in the file (that we don't want to make Parent-Child) could just be done with Events, using the Witness extension, though it might make sense to add a custom linking property to keep the relationship a bit more direct. The area where things get trickier is when HRE stores relationships between various attributes (like event A happened after event B, maybe even giving a rough time period). One solution would be to just convert these to Notes, which would be the traditional GEDCOM method (Standard can't handle it to punt it to making the user process it). An alternative would be to have the event create INDI records for themselves with a pointer to it, and let those INDI records have an event to define the relationship, or for simpler interrelationships, just add an xref_ID to the events and use a custom sub-tag to express the relationship. Not getting this sort of information in a GEDCOM export might not be that bad, as other programs that are reliant on GEDCOM are likely not going to be processing it. Would be glad to help brainstorm ideas for GEDCOM export. I some ways I see GEDCOM export more needed in the early bootstrap phase to get people more at ease to transfer in, and maybe for linking to things like GedSite for output until HRE develops those sort of tools internally.