Render "invalid utf8 string" will crash fcitx5

hoofcushion commented 5 months ago

Summary

Render "invalid utf8 string" will crash fcitx5, for example: 0xffff (has no unicode representation)

Steps to Reproduce

Use fcitx5-rime
Try yield(Candidate("",0,0,utf8.char(65535),"")) inside a lua_translator
The error occurs, fcitx5 will crash.

Expected Behavior

Don't crash when an "Invalid utf8 string error" occurs. Maybe skip them all when rendering.

Output of fcitx5-diagnose command

➜  fcitx5 git:(master) fcitx5
I2024-05-08 00:24:09.990140 instance.cpp:1378] Override Enabled Addons: {}
I2024-05-08 00:24:09.990528 instance.cpp:1379] Override Disabled Addons: {virtualkeyboard, table, quickphrase, spell, pinyinhelper, fullwidth, fcitx4frontend, emoji, punctuation, cloudpinyin, clipboard, pinyin, chttrans}
E2024-05-08 00:24:10.009353 waylandmodule.cpp:239] Failed to open wayland connection
I2024-05-08 00:24:10.009567 addonmanager.cpp:193] Loaded addon wayland
I2024-05-08 00:24:10.010164 addonmanager.cpp:193] Loaded addon unicode
I2024-05-08 00:24:10.057177 xcbconnection.cpp:189] Connecting to X11 display, display name::0.
I2024-05-08 00:24:10.062033 addonmanager.cpp:193] Loaded addon xcb
I2024-05-08 00:24:10.066985 addonmanager.cpp:193] Loaded addon imselector
I2024-05-08 00:24:10.088227 addonmanager.cpp:193] Loaded addon keyboard
I2024-05-08 00:24:10.088591 addonmanager.cpp:193] Loaded addon waylandim
I2024-05-08 00:24:10.095677 addonmanager.cpp:193] Loaded addon dbus
I2024-05-08 00:24:10.099236 addonmanager.cpp:193] Loaded addon ibusfrontend
I2024-05-08 00:24:10.106437 addonmanager.cpp:193] Loaded addon xim
I2024-05-08 00:24:10.113904 addonmanager.cpp:193] Loaded addon dbusfrontend
I2024-05-08 00:24:10.159336 inputmethodmanager.cpp:192] Found 734 input method(s) in addon keyboard
I2024-05-08 00:24:10.192752 addonmanager.cpp:193] Loaded addon kimpanel
I2024-05-08 00:24:10.269163 classicui.cpp:64] Created classicui for x11 display::0
I2024-05-08 00:24:10.269326 addonmanager.cpp:193] Loaded addon classicui
I2024-05-08 00:24:10.269864 addonmanager.cpp:193] Loaded addon notificationitem
I2024-05-08 00:24:10.270615 addonmanager.cpp:193] Loaded addon notifications
I2024-05-08 00:24:10.305838 dbusmodule.cpp:790] Service name change: org.fcitx.Fcitx5  :1.287
I2024-05-08 00:24:10.464313 kimpanel.cpp:116] Kimpanel new owner:
I2024-05-08 00:24:10.464421 portalsettingmonitor.cpp:91] A new portal show up, start a new query.
I2024-05-08 00:24:11.444761 addonmanager.cpp:193] Loaded addon rime
E20240508 00:24:13.529407 137063781927040 key_event.cc:77] parse error: unrecognized key 'Enter'
E20240508 00:24:16.163501 137063781927040 types.cc:1432] 15
E20240508 00:24:16.307758 137063781927040 types.cc:1432] 255ÿ
E20240508 00:24:16.482750 137063781927040 types.cc:1432] 4095࿿
E20240508 00:25:00.742167 137063781927040 key_event.cc:77] parse error: unrecognized key 'Enter'
E20240508 00:25:00.850496 137063781927040 key_event.cc:77] parse error: unrecognized key 'Enter'
F2024-05-08 00:25:18.627534 event_sdevent.cpp:244] Invalid utf8 string
=========================
Fcitx 5.1.10 -- Get Signal No.: 6
Date: try "date -d @1715099118" if you are using GNU date ***
ProcessID: 15718
fcitx5(+0xebcc)[0x59d162f63bcc]
/usr/lib/libc.so.6(+0x3ce20)[0x7ca8a5c58e20]
/usr/lib/libc.so.6(+0x90194)[0x7ca8a5cac194]
/usr/lib/libc.so.6(gsignal+0x20)[0x7ca8a5c58d70]
/usr/lib/libc.so.6(abort+0xdf)[0x7ca8a5c404c0]
/usr/lib/libFcitx5Utils.so.2(+0x1b0c7)[0x7ca8a61c40c7]
/usr/lib/libFcitx5Utils.so.2(+0x1800d)[0x7ca8a61c100d]
/usr/lib/libsystemd.so.0(+0x7d64a)[0x7ca8a5bac64a]
/usr/lib/libsystemd.so.0(sd_event_dispatch+0x11e)[0x7ca8a5bac96e]
/usr/lib/libsystemd.so.0(sd_event_run+0x119)[0x7ca8a5bae689]
/usr/lib/libsystemd.so.0(sd_event_loop+0x60)[0x7ca8a5bae860]
/usr/lib/libFcitx5Utils.so.2(_ZN5fcitx9EventLoop4execEv+0x16)[0x7ca8a61d6b46]
/usr/lib/libFcitx5Core.so.7(_ZN5fcitx8Instance4execEv+0x5c)[0x7ca8a62b095c]
fcitx5(+0xd090)[0x59d162f62090]
/usr/lib/libc.so.6(+0x25d4a)[0x7ca8a5c41d4a]
/usr/lib/libc.so.6(__libc_start_main+0x8c)[0x7ca8a5c41e0c]
fcitx5(+0xe455)[0x59d162f63455]
[1]    15718 IOT instruction (core dumped)  fcitx5

# 系统信息:
1.  `uname -a`:

        Linux HC 6.8.9-zen1-1-zen #1 ZEN SMP PREEMPT_DYNAMIC Thu, 02 May 2024 17:48:53 +0000 x86_64 GNU/Linux

2.  `lsb_release -a`:

        LSB Version:    n/a
        Distributor ID: Arch
        Description:    Arch Linux
        Release:    rolling
        Codename:   n/a

3.  `lsb_release -d`:

        Description:    Arch Linux

4.  `/etc/lsb-release`:

        DISTRIB_ID="Arch"
        DISTRIB_RELEASE="rolling"
        DISTRIB_DESCRIPTION="Arch Linux"

5.  `/etc/os-release`:

        NAME="Arch Linux"
        PRETTY_NAME="Arch Linux"
        ID=arch
        BUILD_ID=rolling
        ANSI_COLOR="38;2;23;147;209"
        HOME_URL="https://archlinux.org/"
        DOCUMENTATION_URL="https://wiki.archlinux.org/"
        SUPPORT_URL="https://bbs.archlinux.org/"
        BUG_REPORT_URL="https://gitlab.archlinux.org/groups/archlinux/-/issues"
        PRIVACY_POLICY_URL="https://terms.archlinux.org/docs/privacy-policy/"
        LOGO=archlinux-logo

6.  桌面环境：

    桌面环境为 `kde`。

7.  XDG 会话类型：

        XDG_SESSION_TYPE='x11'

8.  Bash 版本：

        BASH_VERSION='5.2.26(1)-release'

# 环境：
1.  DISPLAY:

        DISPLAY=':0'

        WAYLAND_DISPLAY=''

2.  键盘布局：

    1.  `setxkbmap`:

            xkb_keymap {
                xkb_keycodes  { include "evdev+aliases(qwerty)" };
                xkb_types     { include "complete"  };
                xkb_compat    { include "complete"  };
                xkb_symbols   { include "pc+us+inet(evdev)" };
                xkb_geometry  { include "pc(pc104)" };
            };

    2.  `xprop`:

            _XKB_RULES_NAMES(STRING) = "evdev", "pc104", "us", "", ""

3.  Locale：

    1.  全部可用 locale：

            C
            C.utf8
            en_GB.utf8
            en_US.utf8
            POSIX
            zh_CN.utf8

    2.  当前 locale：

            LANG=zh_CN.UTF-8
            LC_CTYPE="zh_CN.UTF-8"
            LC_NUMERIC="zh_CN.UTF-8"
            LC_TIME="zh_CN.UTF-8"
            LC_COLLATE="zh_CN.UTF-8"
            LC_MONETARY="zh_CN.UTF-8"
            LC_MESSAGES="zh_CN.UTF-8"
            LC_PAPER="zh_CN.UTF-8"
            LC_NAME="zh_CN.UTF-8"
            LC_ADDRESS="zh_CN.UTF-8"
            LC_TELEPHONE="zh_CN.UTF-8"
            LC_MEASUREMENT="zh_CN.UTF-8"
            LC_IDENTIFICATION="zh_CN.UTF-8"
            LC_ALL=

4.  目录：

    1.  主目录：

            /home/Hoofcushion

    2.  `${XDG_CONFIG_HOME}`:

        环境变量 `XDG_CONFIG_HOME` 没有设定。

        `XDG_CONFIG_HOME` 的当前值是 `~/.config` (`/home/Hoofcushion/.config`)。

    3.  Fcitx5 设置目录：

        当前 fcitx5 设置目录是 `~/.config/fcitx5` (`/home/Hoofcushion/.config/fcitx5`)。

5.  当前用户：

    脚本作为 Hoofcushion (1000) 运行。

# Fcitx 状态:
1.  可执行文件：

    在 `/usr/bin/fcitx5` 找到了 fcitx5。

2.  版本：

    Fcitx 版本: `5.1.10`

3.  进程：

    **Fcitx5 没有在运行。**
    **请访问 [入门指南](http://fcitx-im.org/wiki/Beginner%27s_Guide/zh-cn) 页面上对应您发行版的配置链接查看如何配置 fcitx5 的自动启动.**

# Fcitx 配置界面：
1.  配置工具封装：

    在 `/usr/bin/fcitx5-configtool` 找到了 fcitx5-configtool。

2.  Qt 的配置界面：

    在 `/usr/bin/fcitx5-config-qt` 找到了 `fcitx5-config-qt`。

3.  KDE 的配置界面：

    **`kcmshell5` 未找到.**

# 前端设置：
此脚本检查的环境变量仅能显示当前命令行的环境。仍有可能您的环境并没有应用于整个桌面。您可以通过使用命令对某个无法正常工作的进程使用命令 `xargs -0 -L1 /proc/$PID/environ` 检查此进程的实际的环境变量。

## Xim:
1.  `${XMODIFIERS}`:

    **环境变量 XMODIFIERS 的值被设为了“@im=fcitx5”而不是“@im=fcitx”。请检查您是否在某个初始化文件中错误的设置了它的值。**

    **请使用您发行版提供的工具将环境变量 XMODIFIERS 设为 "@im=fcitx" 或者将 `export XMODIFIERS=@im=fcitx` 添加到您的 `~/.xprofile` 中。参见 [输入法相关的环境变量：XMODIFIERS](http://fcitx-im.org/wiki/Input_method_related_environment_variables/zh-cn#XMODIFIERS)。**

    从环境变量中获取的 Xim 服务名称为 fcitx5.

2.  根窗口上的 XIM_SERVERS：

    Xim 服务的名称与环境变量中设置的相同。

## Qt:
1.  qt4 - `${QT4_IM_MODULE}`:

    **环境变量 QT_IM_MODULE 的值被设为了“fcitx5”而不是“fcitx”。请检查您是否在某个初始化文件中错误的设置了它的值。**
    **您可能会在 qt4 程序中使用 fcitx 时遇到问题.**

    **请使用您发行版提供的工具将环境变量 QT_IM_MODULE 设为 "fcitx" 或者将 `export QT_IM_MODULE=fcitx` 添加到您的 `~/.xprofile` 中。参见 [输入法相关的环境变量：QT_IM_MODULE](http://fcitx-im.org/wiki/Input_method_related_environment_variables/zh-cn#QT_IM_MODULE)。**

    **`fcitx5-qt4-immodule-probing` 未找到.**

2.  qt5 - `${QT_IM_MODULE}`:

    **环境变量 QT_IM_MODULE 的值被设为了“fcitx5”而不是“fcitx”。请检查您是否在某个初始化文件中错误的设置了它的值。**
    **您可能会在 qt5 程序中使用 fcitx 时遇到问题.**

    **请使用您发行版提供的工具将环境变量 QT_IM_MODULE 设为 "fcitx" 或者将 `export QT_IM_MODULE=fcitx` 添加到您的 `~/.xprofile` 中。参见 [输入法相关的环境变量：QT_IM_MODULE](http://fcitx-im.org/wiki/Input_method_related_environment_variables/zh-cn#QT_IM_MODULE)。**

    使用 fcitx5-qt5-immodule-probing 来检查在当前环境下将被实际使用的输入法模块：

        QT_QPA_PLATFORM=xcb
        QT_IM_MODULE=fcitx5
        IM_MODULE_CLASSNAME=fcitx::QFcitxPlatformInputContext

3.  qt6 - `${QT_IM_MODULE}`:

    **环境变量 QT_IM_MODULE 的值被设为了“fcitx5”而不是“fcitx”。请检查您是否在某个初始化文件中错误的设置了它的值。**
    **您可能会在 qt6 程序中使用 fcitx 时遇到问题.**

    **请使用您发行版提供的工具将环境变量 QT_IM_MODULE 设为 "fcitx" 或者将 `export QT_IM_MODULE=fcitx` 添加到您的 `~/.xprofile` 中。参见 [输入法相关的环境变量：QT_IM_MODULE](http://fcitx-im.org/wiki/Input_method_related_environment_variables/zh-cn#QT_IM_MODULE)。**

    使用 fcitx5-qt6-immodule-probing 来检查在当前环境下将被实际使用的输入法模块：

        QT_QPA_PLATFORM=xcb
        QT_IM_MODULE=fcitx5
        IM_MODULE_CLASSNAME=fcitx::QFcitxPlatformInputContext

4.  Qt 输入法模块文件：

    找到了 fcitx5 的 qt6 输入法模块：`/usr/lib/qt6/plugins/platforminputcontexts/libfcitx5platforminputcontextplugin.so`。
    找到了未知的 fcitx qt 模块：`/usr/lib/qt6/plugins/plasma/kcms/systemsettings/kcm_fcitx5.so`。
    找到了 fcitx5 的 qt 输入法模块：`/usr/lib/qt/plugins/platforminputcontexts/libfcitx5platforminputcontextplugin.so`。
    找到了 fcitx5 qt6 模块：`/usr/lib/fcitx5/qt6/libfcitx-quickphrase-editor5.so`。
    找到了 fcitx5 qt5 模块：`/usr/lib/fcitx5/qt5/libfcitx-quickphrase-editor5.so`。

    下列错误也许并不准确，因为对路径所对应的 Qt 版本的猜测取决于发行版如何打包 Qt。如果您不使用任何对应版本的 Qt 程序，或者在 Wayland 下使用 Qt 的 text-input 支持，下列错误也不是严重问题。
    **无法找到 Qt4 的 fcitx5 输入法模块。**

## Gtk:
1.  gtk - `${GTK_IM_MODULE}`:

    **环境变量 GTK_IM_MODULE 的值被设为了“fcitx5”而不是“fcitx”。请检查您是否在某个初始化文件中错误的设置了它的值。**
    **您可能会在 gtk 程序中使用 fcitx 时遇到问题.**

    **请使用您发行版提供的工具将环境变量 GTK_IM_MODULE 设为 "fcitx" 或者将 `export GTK_IM_MODULE=fcitx` 添加到您的 `~/.xprofile` 中。参见 [输入法相关的环境变量：GTK_IM_MODULE](http://fcitx-im.org/wiki/Input_method_related_environment_variables/zh-cn#GTK_IM_MODULE)。**

    使用 fcitx5-gtk2-immodule-probing 来检查在当前环境下将被实际使用的输入法模块：

        GTK_IM_MODULE=fcitx5

    使用 fcitx5-gtk3-immodule-probing 来检查在当前环境下将被实际使用的输入法模块：

        GTK_IM_MODULE=fcitx5

    使用 fcitx5-gtk4-immodule-probing 来检查在当前环境下将被实际使用的输入法模块：

        GTK_IM_MODULE=fcitx5

2.  `gtk-query-immodules`:

    1.  gtk 2:

        在 `/usr/bin/gtk-query-immodules-2.0` 找到了 gtk `2.24.33` 的 `gtk-query-immodules`。
        版本行：

            # Created by /usr/bin/gtk-query-immodules-2.0 from gtk+-2.24.33

        已找到 gtk `2.24.33` 的 fcitx5 输入法模块。

            "/usr/lib/gtk-2.0/2.10.0/immodules/im-fcitx5.so" 
            "fcitx" "Fcitx5 (Flexible Input Method Framework5)" "fcitx5" "/usr/locale" "ja:ko:zh:*" 
            "fcitx5" "Fcitx5 (Flexible Input Method Framework5)" "fcitx5" "/usr/locale" "ja:ko:zh:*" 

        在 `/usr/bin/gtk-query-immodules-2.0-32` 找到了 gtk `2.24.33` 的 `gtk-query-immodules`。
        版本行：

            # Created by /usr/bin/gtk-query-immodules-2.0-32 from gtk+-2.24.33

        **无法在 `/usr/bin/gtk-query-immodules-2.0-32` 的输出中找到 fcitx5。**

    2.  gtk 3:

        在 `/usr/bin/gtk-query-immodules-3.0` 找到了 gtk `3.24.41` 的 `gtk-query-immodules`。
        版本行：

            # Created by /usr/bin/gtk-query-immodules-3.0 from gtk+-3.24.41

        已找到 gtk `3.24.41` 的 fcitx5 输入法模块。

            "/usr/lib/gtk-3.0/3.0.0/immodules/im-fcitx5.so" 
            "fcitx" "Fcitx5 (Flexible Input Method Framework5)" "fcitx5" "/usr/locale" "ja:ko:zh:*" 
            "fcitx5" "Fcitx5 (Flexible Input Method Framework5)" "fcitx5" "/usr/locale" "ja:ko:zh:*" 

3.  Gtk 输入法模块缓存：

    1.  gtk 2:

        在 `/usr/lib/gtk-2.0/2.10.0/immodules.cache` 找到了 gtk `2.24.33` 的输入法模块缓存。
        版本行：

            # Created by /usr/bin/gtk-query-immodules-2.0 from gtk+-2.24.33

        已找到 gtk `2.24.33` 的 fcitx5 输入法模块。

            "/usr/lib/gtk-2.0/2.10.0/immodules/im-fcitx5.so" 
            "fcitx" "Fcitx5 (Flexible Input Method Framework5)" "fcitx5" "/usr/locale" "ja:ko:zh:*" 
            "fcitx5" "Fcitx5 (Flexible Input Method Framework5)" "fcitx5" "/usr/locale" "ja:ko:zh:*" 

        在 `/usr/lib32/gtk-2.0/2.10.0/immodules.cache` 找到了 gtk `2.24.33` 的输入法模块缓存。
        版本行：

            # Created by usr/bin/gtk-query-immodules-2.0-32 from gtk+-2.24.33

        **无法输入法模块缓存 `/usr/lib32/gtk-2.0/2.10.0/immodules.cache` 中找到 fcitx5**

    2.  gtk 3:

        在 `/usr/lib/gtk-3.0/3.0.0/immodules.cache` 找到了 gtk `3.24.41` 的输入法模块缓存。
        版本行：

            # Created by /usr/bin/gtk-query-immodules-3.0 from gtk+-3.24.41

        已找到 gtk `3.24.41` 的 fcitx5 输入法模块。

            "/usr/lib/gtk-3.0/3.0.0/immodules/im-fcitx5.so" 
            "fcitx" "Fcitx5 (Flexible Input Method Framework5)" "fcitx5" "/usr/locale" "ja:ko:zh:*" 
            "fcitx5" "Fcitx5 (Flexible Input Method Framework5)" "fcitx5" "/usr/locale" "ja:ko:zh:*" 

4.  Gtk 输入法模块文件：

    1.  gtk 2:

        找到的全部 Gtk 2 输入法模块文件均存在。

    2.  gtk 3:

        找到的全部 Gtk 3 输入法模块文件均存在。

    3.  gtk 4:

        找到的全部 Gtk 4 输入法模块文件均存在。

# 配置:
## Fcitx 插件：
1.  插件配置文件目录：

    找到了 fcitx5 的插件配置目录：`/usr/share/fcitx5/addon`。

2.  插件列表：

    1.  找到了 15 个已启用的插件：

            Classic User Interface 5.1.10
            DBus 5.1.10
            DBus Frontend 5.1.10
            IBus Frontend 5.1.10
            Input method selector 5.1.10
            Keyboard 5.1.10
            KDE Input Method Panel 5.1.10
            Status Notifier 5.1.10
            Notification 5.1.10
            Rime 5.1.5
            Unicode 5.1.10
            Wayland 5.1.10
            Wayland Input method frontend 5.1.10
            XCB 5.1.10
            X Input Method Frontend 5.1.10

    2.  找到了 13 个被禁用的插件：

            Simplified and Traditional Chinese Translation 5.1.4
            Clipboard 5.1.10
            Cloud Pinyin 5.1.4
            Emoji 5.1.10
            Fcitx4 Frontend 5.1.10
            Full width character 5.1.4
            Pinyin 5.1.4
            Extra Pinyin functionality 5.1.4
            Punctuation 5.1.4
            Quick Phrase 5.1.10
            Spell 5.1.10
            Table 5.1.4
            DBus Virtual Keyboard 5.1.10

3.  插件库: 

    所有插件所需的库都被找到。

4.  用户界面：

    找到了 2 个已启用的用户界面插件：

        Classic User Interface
        KDE Input Method Panel

## 输入法：
1.  `/home/Hoofcushion/.config/fcitx5/profile`:

        [Groups/0]
        # Group Name
        Name=Default
        # Layout
        Default Layout=us
        # Default Input Method
        DefaultIM=rime

        [Groups/0/Items/0]
        # Name
        Name=keyboard-us
        # Layout
        Layout=

        [Groups/0/Items/1]
        # Name
        Name=rime
        # Layout
        Layout=

        [GroupOrder]
        0=Default

# 日志：
1.  `date`:

        2024年 05月 08日 星期三 00:25:43 CST

2.  `/home/Hoofcushion/.config/fcitx5/crash.log`:

        =========================
        Fcitx 5.1.10 -- Get Signal No.: 6
        Date: try "date -d @1715099118" if you are using GNU date ***
        ProcessID: 15718
        fcitx5(+0xebcc)[0x59d162f63bcc]
        /usr/lib/libc.so.6(+0x3ce20)[0x7ca8a5c58e20]
        /usr/lib/libc.so.6(+0x90194)[0x7ca8a5cac194]
        /usr/lib/libc.so.6(gsignal+0x20)[0x7ca8a5c58d70]
        /usr/lib/libc.so.6(abort+0xdf)[0x7ca8a5c404c0]
        /usr/lib/libFcitx5Utils.so.2(+0x1b0c7)[0x7ca8a61c40c7]
        /usr/lib/libFcitx5Utils.so.2(+0x1800d)[0x7ca8a61c100d]
        /usr/lib/libsystemd.so.0(+0x7d64a)[0x7ca8a5bac64a]
        /usr/lib/libsystemd.so.0(sd_event_dispatch+0x11e)[0x7ca8a5bac96e]
        /usr/lib/libsystemd.so.0(sd_event_run+0x119)[0x7ca8a5bae689]
        /usr/lib/libsystemd.so.0(sd_event_loop+0x60)[0x7ca8a5bae860]
        /usr/lib/libFcitx5Utils.so.2(_ZN5fcitx9EventLoop4execEv+0x16)[0x7ca8a61d6b46]
        /usr/lib/libFcitx5Core.so.7(_ZN5fcitx8Instance4execEv+0x5c)[0x7ca8a62b095c]
        fcitx5(+0xd090)[0x59d162f62090]
        /usr/lib/libc.so.6(+0x25d4a)[0x7ca8a5c41d4a]
        /usr/lib/libc.so.6(__libc_start_main+0x8c)[0x7ca8a5c41e0c]
        fcitx5(+0xe455)[0x59d162f63455]

**警告：fcitx5-diagnose 的输出可能包含敏感信息，包括发行版名称，内核版本，正在运行的程序名称等。**

**尽管这些信息对于开发者诊断问题有帮助，请在公开发送到在线网站前检查并且根据需要移除的对应信息。**

hoofcushion commented 5 months ago

#define UNICODE_VALID(Char)                                                    \
    ((Char) < 0x110000 && (((Char) & 0xFFFFF800) != 0xD800) &&                 \
     ((Char) < 0xFDD0 || (Char) > 0xFDEF) && ((Char) & 0xFFFE) != 0xFFFE)

我找到了这段宏定义，转换成如下 lua 形式，在 yield 前检查一遍就可以避免触发崩溃了，仅仅因为 utf8 字符串不合法就崩溃是不是不太合理，请问能否做些调整？

local function unicode_valid(char)
 return char<0x110000 and
  ((char&0xfffff800)~=0xd800) and
  (char<0xfdd0 or char>0xfdef) and
  (char&0xfffe)~=0xfffe
end

wengxt commented 5 months ago

首先，fcitx 使用 dbus 进行通信，dbus 要求所有的 string 都是合法的 utf8 string，如果我不 crash 直接发送，那别的库就会替我 crash

其次，所有来自 engine 的非法的 string，都认为是 engine 的 bug，所以与其校验后替换为一个空字符串，我宁愿直接 crash。

wengxt commented 5 months ago

请不要发送非法的字符串。

hoofcushion commented 5 months ago

不好意思，我还有疑问，非字符 (noncharacter) 不应该是非法字符，至少对于非字符，fcitx 可以尝试保留或替换为空字符串，这在任何文本流中都应该是无害的。 Corrigendum #9: Clarification About Noncharacters Are noncharacters invalid in Unicode strings and UTFs? Can failing to replace noncharacters with U+FFFD lead to problems?

hoofcushion commented 5 months ago

在 dbus 中，非字符是否合法的问题也在被澄清了，我不太清楚其他库具体是什么情况，但是对于非字符串来说，认为他们是非法字符串，或者 dbus 会因此崩溃可能是不合适的。 Specification: explicitly allow the Unicode noncharacters Bug 63072 - allow Unicode non-characters as per Corrigendum 9 If my application makes specific, internal use of a noncharacter, what should I do with input text?

CoelacanthusHex commented 5 months ago

不好意思，我还有疑问，非字符 (noncharacter) 不应该是非法字符，至少对于非字符，fcitx 可以尝试保留或替换为空字符串，这在任何文本流中都应该是无害的。 Corrigendum #9: Clarification About Noncharacters Are noncharacters invalid in Unicode strings and UTFs? Can failing to replace noncharacters with U+FFFD lead to problems?

从勘误表#9引文

Noncharacters in the Unicode Standard are intended for internal use

而输入法作为一个跨应用程序、混成器、输入法框架、输入法引擎的架构，我认为并不符合 internal use 的定义，所以不应该在输入法的架构里传递 noncharacter.

hoofcushion commented 5 months ago

Corrigendum #9: Clarification About Noncharacters

The real intent of noncharacters is that they are permanently prohibited from being assigned standard, interchangeable meanings, rather than that they are prohibited from occurring in Unicode strings which happen to be interchanged.

Change D14 in Section 3.4, Characters and Encoding, as indicated: Noncharacter: A code point that is permanently reserved for internal use ~~and that should never be interchanged~~. Noncharacters consist of the values U+nFFFE and U+nFFFF (where n is from 0 to 1016) and the values U+FDD0..U+FDEF.

Unicode 对此的解释很清晰，可交换的文本中出现非字符并不被禁止，非字符的主要用途是“内部使用”并且被“永久保留”，并不意味着他不能被交换，正因如此才需要澄清 "should never be interchanged" 的错误定义，不然Corrigendum #9就没有意义了。

CoelacanthusHex commented 5 months ago

Corrigendum #9: Clarification About Noncharacters

The real intent of noncharacters is that they are permanently prohibited from being assigned standard, interchangeable meanings, rather than that they are prohibited from occurring in Unicode strings which happen to be interchanged.

Change D14 in Section 3.4, Characters and Encoding, as indicated: Noncharacter: A code point that is permanently reserved for internal use ~and that should never be interchanged~. Noncharacters consist of the values U+nFFFE and U+nFFFF (where n is from 0 to 1016) and the values U+FDD0..U+FDEF.

Unicode 对此的解释很清晰，可交换的文本中出现非字符并不被禁止，非字符的主要用途是“内部使用”并且被“永久保留”，并不意味着他不能被交换，正因如此才需要澄清 "should never be interchanged" 的错误定义，不然Corrigendum #9就没有意义了。

我并不是说「不能」，而是说「无意义」，非字符的使用需要交换双方对其含义有一致的定义，否则双方不一定能正确处理非字符的存在（比如显示的时候怎么处理、输出的时候），而输入法的交换对象存在大量不受输入法控制的第三方应用，除非输入法协议约定了对非字符的处理方式，否则接收到输入法传输的非字符的应用程序不一定能正确处理非字符，进而产生各种非预期结果。再者，输入法协议传输的字符串，无非是两种，用于显示的，和用于输入的，而非字符对于这两种用途都是毫无意义的，因为非字符既不能被显示，也不是接收文本的程序预期的输入。

hoofcushion commented 5 months ago

用户可能就是想输入这个字符，而且 Unicode 也并不禁止，非字符在文本流中也是无害的，除非其他程序刻意对非字符崩溃，这种行为与刻意对其他合法字符崩溃无异，是这些程序的漏洞，而不是输入法的。

wengxt commented 5 months ago

@hoofcushion 既然他们改了我们可以改成一样的

fcitx / fcitx5