frizbog / gedcom4j

Java library for reading/writing genealogy files in GEDCOM format
http://gedcom4j.org
54 stars 36 forks source link

Support for well-known custom tag sets #140

Closed frizbog closed 7 years ago

frizbog commented 8 years ago

Several popular genealogy packages use well-defined sets of custom tags to represent their data. Decorator/adapter classes could provide first-order API support for pulling this data out of, validating, and putting this data into the custom tag structures, without changing the underlying model classes, or any of the writers.

The architecture for this should be extensible, so future authors could add support for other/future software packages' custom tags. gedcom4j should ship with support for the 2 or 3 most popular packages.

Family Tree Maker will be one of the packages selected. Other packages to consider: Legacy Family Tree, Family Historian, Reunion, and RootsMagic.

frizbog commented 7 years ago

This looks like a good source of information: http://genealogytools.com/family-tree-maker-to-gedcom-to-other-apps-crosswalk/

as is this: http://wiki-de.genealogy.net/GEDCOM/_Nutzerdef-Tag#.C3.9Cbersicht_bekannter_Nutzerdefinierter_Kennzeichen

and this: http://www.gencom.org.nz/GEDCOM_tags.html

frizbog commented 7 years ago

Information on Family Historian (UK): http://www.fhug.org.uk/wiki/doku.php?id=glossary:gedcom_extension_list

frizbog commented 7 years ago

Information on Legacy Family Tree: http://support.legacyfamilytree.com/article/AA-00520/14/Tips-and-How-Tos/GEDCOM-Files-Custom-tags-in-Legacy.html

frizbog commented 7 years ago

Ran into a major snag and I may have overpromised. Custom tags that have standard children structures are an issue. As a specific example, consider a custom tag like _WEIG (for weight) that has a standard source-citation sub-structure...a quite complex structure. Writing an adapter layer over that sub-structure that looks just like a regular source citation but using the underlying customTags List of StringTree objects as its backing store may not be possible, or certainly not achievable by the end of September.

This is going to require some more thought. If I cannot get a solution here, I may have to remove this issue from the planned release and scale back plans a bit.

Rolling back the work done so far on this.

Psyches commented 7 years ago

Right, I've wondered if we shouldn't treat all tags as custom tags, or conversely treat custom tags exactly as we do all non-custom tags...with full-fledged support for parsing, emitting, modeling, etc. Before I suggested registration of a 4-tuple but I guess a 5-tuple would make the most sense to me: (TAG, model-class, validator-class, parser-class, emitter-class); and then a de-facto registration would support non-specific custom tags via the current generic wrapper approach. It'd be a really big change but could support all current and future tags, and would allow code reuse or overrides when necessary.

frizbog commented 7 years ago

@Psyches, You're right, that would be a really big change. Taking a 4-classes-per-tag approach would be almost a rewrite in a lot of ways, and would be a big deviation away from the convenience and ease-of-use of the fairly simple POJO-based model classes, replacing it with a recursive structure consisting of (primarily) the same base class. I'm not sure I have much of an appetite for that, honestly...and it's certainly not something that could be completed any time soon. Not ruling that out -- it's a very clean solution, and may eventually be what's needed. But I fear that something that drastic would need to be done in steps and would seriously change the flavor and character of the library. Needs more thought.

As a less-impactful solution that might work and provide the vast majority of what's needed--and may even be a baby step towards the generic approach you're mentioning-- I'm thinking of creating a general-purpose CustomFact class that is just like any other class in the model, that includes a field for the tag to use when emitting it, and having built-in first-order support for dates, places, notes, and source citations. When the parser encounters a custom tag, instead of just copying the StringTree into the object and calling it a day, it could store the custom tag that represented it, and then look for standard Dates, Places, Notes, and Source-Citations under the custom tag...and then recursively hold more CustomFact objects under that for anything else it doesn't have built-in support for. It would look much like an IndividualEvent (which has a type field, much like the tag field would be except it would be an open-ended string value rather than an enumerated value). Rather than representing custom tags as a List of StringTree objects, it would be a list of CustomFact objects.

Going to experiment with this idea for a while.

Psyches commented 7 years ago

Your thinking all makes sense to me...

frizbog commented 7 years ago

Thanks @Psyches - please keep the comments coming. Your feedback and ideas and suggestions are making me and the product better.

frizbog commented 7 years ago

Things look back on track, although having to rework the CustomTags from StringTrees to first-order CustomFact objects in the model is a pretty substantial change to the API. In most cases, some automatic reworking of imports and some global search-and-replace will take care of things, but it could be a lot of work for anyone who uses custom tags heavily.

Family Tree Maker 3 and Family Historian are basically complete. I think I will do Legacy Family Tree next, since they've documented their custom tag usage, and offer free trials of their software so I can make a GEDCOM full of their specific custom tags.