frizbog / gedcom4j

Java library for reading/writing genealogy files in GEDCOM format
http://gedcom4j.org
53 stars 36 forks source link

ANSEL is not ANSI in anselAsciiOrUtf8() #50

Closed DanCas closed 11 years ago

DanCas commented 11 years ago

line 98

ANSI as marked in many GEDCOM-files means not the ANSI-standard but the Windows code page. The readable chars are the same in UTF-8. You can read it on

http://en.wikipedia.org/wiki/Windows_code_page. http://nl.wikipedia.org/wiki/ISO_8859-1 http://en.wikipedia.org/wiki/Windows-1252 read detail "Historically, the phrase “ANSI Code Page” (ACP) is used in Windows to refer to various code pages considered as native"......

So it is more correct to see ANSI as a subset of the UTF-8.

In the GEDCOM Standard R 5.5.1 on p 77 you can read also: "Systems using code pages to support diacritical characters, such as the windows ANSI 1252 code page, must convert all characters above character code 0x7F to its ANSEL representation for that code page." And on p 79 you can read. "UNICODE character set or the 8-bit UTF-8 form should be used for multi-language support as soon as operating systems begin providing adequate storage and display support." Knowing that nearly al the computers work with UTF-8, it's beter to use the UTF-8 code to write gedcomfiles.

frizbog commented 11 years ago

Agreed. UTF-8 is a better choice. Making that change now.

frizbog commented 11 years ago

Checked in. Will be included in next release (whenever that is).