frizbog / gedcom4j

Java library for reading/writing genealogy files in GEDCOM format
http://gedcom4j.org
53 stars 36 forks source link

Custom Tags are not well-handled #35

Closed frizbog closed 11 years ago

frizbog commented 11 years ago

If you load a file with custom tags (i.e., tags beginning with underscores), the GedcomParser class adds a warning to its warnings collection, then ignores the custom tag.

This means if you load a file with custom tags and then write it with the GedcomWriter, you will effectively strip out all the custom tag data and it is not rewritten out.

There needs to be a better way to handle the custom tag data that is at least lossless. Since there is no way to know how someone has defined a custom tag, the data model won't be able to do anything more than store raw custom tag data for the user to parse/deal with manually, but it would at least prevent the loss of data.

(Migrated from http://code.google.com/p/gedcom4j/issues/detail?id=35)


earlier comments

frizbog1 said, at 2012-11-30T19:42:56.000Z:

I think what I will need to do is define a new base type that nearly every type in the data model will extend. That abstract type would have the ability to hold custom tag information so that each item in the data model can have custom tags. Unfortunately, there is no way to parse custom tag data except as raw text, so I would probably have all custom tags stored as com.mattharrah.gedcom4j.parser.StringTree structures.

frizbog commented 11 years ago

Started working on this. Introduced that class I mentioned, but there's a problem....many of the data elements which can have user-defined subtags are stored in the object graph as simple String objects. For custom tags to be stored for these elements, they will need to be changed from Strings to some other object that has a collection of custom tags and a String value.

frizbog commented 11 years ago

This is coded and is in the sources. It has not been released yet as a jar.

frizbog commented 11 years ago

Resolved in v2.1.0

BertKoor commented 10 years ago

Talking about loss of data... maybe it's a good idea to store all the "Cannot handle tag" instances into the customTags. About all the example files I have would lose data if these tags are discarded, and you currently have no mechanism to eg convert it to different structures that do conform to the gedcom standards.

frizbog commented 10 years ago

Yeah, I debated what to do there. Adherance to the GEDCOM standard was something important to me, so how to deal with data that isn't in conformance is an open-ended thing. Obviously, the best approach is to put the user of the library in control about how lenient to be. Perhaps an option on the parser about whether to treat non-standard tags as custom tags. I'm going to open a new issue for that as an enhancement.

frizbog commented 10 years ago

Please see issue #61 which deals with the last thing here.