frizbog / gedcom4j

Java library for reading/writing genealogy files in GEDCOM format
http://gedcom4j.org
53 stars 36 forks source link

Handle files with empty lines #64

Closed bfontaine closed 9 years ago

bfontaine commented 9 years ago

Hello,

I need to parse some GEDCOM files with empty lines, which are disallowed in the official specification. Do I have to pre-process each file to remove empty lines or is there a way to do that with gedcom4j?

frizbog commented 9 years ago

You’d need to do that before having gedcom4j parse the data.  

Best of luck!

On October 28, 2014 at 12:20:30 PM, Baptiste Fontaine (notifications@github.com) wrote:

Hello,

I need to parse some GEDCOM files with empty lines, which are disallowed in the official specification. Do I have to pre-process each file to remove empty lines or is there a way to do that with gedcom4j?

— Reply to this email directly or view it on GitHub.

bfontaine commented 9 years ago

Thanks!

bfontaine commented 9 years ago

I’m pre-processing files to remove empty lines, but I just re-read the specification (draft release 5.5.1), and on page 11, it says (emphasis mine):

Leading white space (tabs, spaces, and extra line terminators) preceding a GEDCOM line should be ignored by the reading system. Systems generating GEDCOM should not place any white space in front of the GEDCOM line.

“extra line terminators” could include empty lines (an empty line is just two \ns, which are “leading extra line terminators” if there’s a record after them).

So in fact empty lines are allowed by the specification if I correctly understand the quoted text.

frizbog commented 9 years ago

Good catch! I will address this ASAP.

On Wed, Oct 29, 2014 at 1:19 PM, Baptiste Fontaine <notifications@github.com

wrote:

I’m pre-processing files to remove empty lines, but I just re-read the specification (draft release 5.5.1), and on page 11, it says (emphasis mine):

Leading white space (tabs, spaces, and extra line terminators) preceding a GEDCOM line should be ignored by the reading system. Systems generating GEDCOM should not place any white space in front of the GEDCOM line.

“extra line terminators” could include empty lines (an empty line is just two \ns, which are “leading extra line terminators” if there’s a record after them).

So in fact empty lines are allowed by the specification if I correctly understand the quoted text.

— Reply to this email directly or view it on GitHub https://github.com/frizbog/gedcom4j/issues/64#issuecomment-60965802.

frizbog commented 9 years ago

Released in v2.1.9.

bfontaine commented 9 years ago

Great!