ProgerXP / Notepad2e

Word highlighting, simultaneous editing, split views, math evaluation, un/grep, comment reformatting, UAC elevation, complete regexps (PCRE), Lua lexers, DPI awareness and more (XP+)
Other
375 stars 52 forks source link

Handling Encoding if the Current Code Page is UTF-8 #402

Open ajkWare opened 2 years ago

ajkWare commented 2 years ago

If the Windows current code page is set to UTF-8 (65001) (and the notepad default encoding is set UTF-8), Notepad2e doesn't handle non-UTF-8 files well.

Automatic UTF-8 checks identify them as 'ANSI' encoding, but then Notepad2e uses the current code page to display them. Since the current code page is UTF-8, non-ASCII characters are displayed incorrectly.

A workround is to reload the file explicitly specifying a code page (but avoiding specifying 'ANSI' which Notepad2e interprets as the current code page).

It would be better to at least check the current code page is not UTF-8 is this situation and avoid displaying non-UTF-8 files as UTF-8, which is never going to work.

It would be possible to provide a user specfied default to use in this situation, though this might be overkill. I don't know if there is a function which makes a guess at the encoding to use based on the character mix - didn't browsers use to do this?. Maybe it is possible to use locale settings to make a better guess than the current code page when the current code page is UTF-8.

I think I saw somewhere that the Windows 11 defaults to UTF-8 - MS have impoved a lot in this area to make UTF-8 a first class citizen - so this situation may become more common.