dpradov / keynote-nf

Tabbed notebook with RichText editor, multi-level notes and strong encryption.
Mozilla Public License 2.0
266 stars 56 forks source link

KeynoteNF does not correctly import UTF8 files (currently only import UTF8-BOM files) #658

Closed mpaulinelli closed 8 months ago

mpaulinelli commented 8 months ago

When I import a UTF8 file in brazilian Portuguese, the accents are all messed up. I am using Windows 11 last version.

dpradov commented 8 months ago

Please look for the reference to UTF8 in the help. Check if you have UTF8 page code enabled at the system level on your computer. Could you provide me with a (reduced) example of the text you are trying to import?

imagen

And please, confirm what KeyNote NF version are you using

mpaulinelli commented 8 months ago

I am using KeynoteFN 1.9.01. See atached file for more informations. keynote-nf-issues658.zip

mpaulinelli commented 8 months ago

Hi, I went to control panel, region, administrative, Language for non-Unicode programs and activated the beta function. I restarted the computer. The import worked well, but all the accents in the existing note titles were replaced by the character �. I had to fix them one by one. Wouldn't it be worth adapting KeyNoteNF to the unicode standard or utf8? Thanks.

dpradov commented 8 months ago

I am verifying that, surprisingly, all the import tests that I have had to do with UTF8 files have been with files that include the BOM (Byte Order Mark) characters, that is, the special signature that is placed at the beginning of the file to indicate its coding. The BOM characters are a sequence of three bytes: EF BB BF. In some programs it is displayed as UTF8-BOM

While the use of BOM may be desirable to avoid misinterpretation of content, it is not required. And that is what surprises me right now, that I would not have tested with UTF8 content files without BOM. But I have tried importing UTF8-BOM and UTF8 files in several versions, from 1.8.0 to the current one, and the same thing happens in all of them.

What is happening with your file is that it is not interpreting the file as UTF8 but as ANSI, and that is why it does not handle accents well (or any other character that is not represented in ANSI according to your page code).

Of course I will correct it. You shouldn't need to use UTF8 as your system page code. In fact, I think it is preferable that it not be. You should try pasting UTF8 text content or different encodings from a browser, to see how it appears. I think maybe it would be best if you left it as you had it and until I fix it, when you want to import UTF8 content you make sure it is as UTF8-BOM. Regarding the titles of the nodes, the application in its day, seeking to ensure compatibility with previous versions, which did not support Unicode, what it does is try to save it as ANSI if possible. If it is not, because it includes characters that are not valid in ANSI, then it saves it as UTF8 (no BOM). You probably won't have to touch the accented titles again, as they will already be saved as UTF8 and should be recognized when you switch to ANSI again.

All the best

1_Test_Unicode_original_UTF8.txt 1_Test_Unicode_original_UTF8-BOM.txt