Serg-Norseman / GEDKeeper

GEDKeeper - program for work with personal genealogical database
https://gedkeeper.net/
GNU General Public License v3.0
159 stars 44 forks source link

Improve supports of European and other non-UTF file encodings #151

Closed Serg-Norseman closed 5 years ago

Serg-Norseman commented 7 years ago

fire-eggs commented on 6 Sep A possibly related note, observed originally with GK. I suggest it is necessary to allow the user to choose between Cyrillic and Latin-1 code pages when loading non-UTF GEDCOM files. GK uses Cyrillic: when I loaded a French-origin GEDCOM, all the french accented characters imported as Cyrillic. This could be initialized at install time by checking the user's locale settings, or a first-time-run setup question.

Serg-Norseman commented on 6 Sep Yes, of course :) in the previous implementation there was such a problem that I could not get to. It will have to be solved. First, we need to restore at least the old code so that UTF / noUTF is working. Then solve the issues of flexibility. I watched code samples yesterday and today to automatically detect the encoding. Because in GEDCOM it is most likely that only CHAR ANSI and sometimes the language of the file will be specified. Some programs, such as Ahnenblatt, write "CHAR Windows-1252" (for example). I do not know if such a practice exists with other programs. I think that we need to have both mechanisms: one will automatically try to determine the encoding of the file, and the second optional - also ask the user.

fire-eggs commented on 7 Sep I do not know if such a practice exists with other programs. There are lots of variations. Which or how many it is important to support is up to you. Lifelines, GenoPro, Rodokmen Pro and others are explicit about which code page to use. (I've never heard of or seen examples from many of these programs). For the full obnoxious details you can read at Tamura Jones. He writes from a Western perspective, so his "recommendation" is to handle "ANSI" as code page 1252 (Windows Latin 1). This is problematic if the user expects code page 1251 (Cyrillic) or 1250 (Central Europe).

Serg-Norseman commented on 7 Sep The article was very interesting, thank you! In the future, I have a lot of work to do ....