Japanese UTF-8 encoding detected as TIS-620 (Windows-874 (Thai))

Hello, For the development of Notepad3, we use the UCHARDET Charset Detector.

In issue #1831 we are faced with a problem of poor Japanese "UTF-8" detection which is detected as: TIS-620 (Windows-874 (Thai)) with reliability level of 99% by UCHARDET. 😕

These text editors detect it as UTF-8 and displays it correctly

Notepad++, Editpad Lite 7, Editplus, Notepad2, Notepad2e, Notepad2-mod, Notepad2-zfuliu and VS Code,!!!

Here the bad detection as "TIS-620"

{
  "manifest_version": 2,
  "name": "k view",
  "version": "0.5",
  "description": "ใในใใ€",
  "browser_action": {
    "default_icon": { "19": "round-done-button.png" }
  },
}

Here the correct detection as "UTF-8"

{
  "manifest_version": 2,
  "name": "k view",
  "version": "0.5",
  "description": "テスト。",
  "browser_action": {
    "default_icon": { "19": "round-done-button.png" }
  },
}

In attachment the original sample: Error Detection encoding_utf-8 (issue #1831).zip

Thanks in advance for your attention. Have a nice day. hpwamr

Feel free to test the BETA version "Notepad3Portable_5.20.116.2708_BETA.paf.exe.7z" or higher. See "Notepad3 BETA-channel access #1129" or here Notepad3Portable_5.20.116.2708_BETA.paf.exe.7z.

Note: "Notepad3Portable BETA" can be used in "2 flavors" (with or without the extension ".7z").

Your comments and suggestions are always welcome... 😃

Joungkyun / libchardet

Japanese UTF-8 encoding detected as TIS-620 (Windows-874 (Thai)) #16